Calculate the Grubb's test statistic, as used by the Grubbs function, to detect outliers. Compare the Grubb's test statistic with the test statistic of the ouliers.
1. Define a data set describing a heatflow experiment and plot it.
2. Define the critical value of the Student's t-distribution with N - 2 degrees of freedom and a significance level of alpha/(2N).
The function qt calculates the inverse cumulative probability density of the Student's t distribution.
3. Define the Grubbs' test statistic as a function of alpha.
4. Define the level of significance for a confidence level of 90%.
5. Call the Grubbs function to detect outliers.
The Grubbs function can accept a matrix as an input, in which case it returns nested pairs of indices for the array locations of the outliers.
6. Compare the Grubb's test statistic with the test statistics of the outliers.
The two outliers have a test statistic greater than the Grubb's test statistic. Even if more than one index is returned, this does not mean that all candidates must be outliers. This is because the critical value and the test statistic change if a candidate is removed. Both are dependent on N.
Because the Grubb’s test assumes that the data is normal, it is worth to check that your data follows a normal distribution. For example, you can use a visual test such as the normal probability plot before proceeding.
GrubbsClassic
Use the GrubbsClassic function to find the point which is the most likely to be an outlier in a data set.
1. Calculate the test statistic which is the greatest for the above data set.
2. Define alpha for a 98% confidence interval.
3. Compare the Grubbs' test statistic with Gmax.
No outliers are detected at this significance level.
4. Call the GrubbsClassic function.
The point returned by GrubbsClassic is not an outlier, but it is the data point which is the most likely to be an outlier.
The Limiting Probability of Detecting Outliers
Use the special construct root to calculate the limiting probability at which outliers are detected.
Outliers are detected when alpha is bigger than α_limit, or in other words, when the confidence interval is smaller than (1 - α_limit):
This is consistent with the above findings. No outliers were detected for a 98% confidence interval, but two outliers were detected for a 90% confidence interval.