Outlier Detection and Removal
The Grubbs, GrubbsClassic and ThreeSigma functions detect outliers in data sets. The trim function removes rows with specified indices from a data set.
• Grubbs(v, a)—Returns the index of suspected outliers, the test statistic for that outlier, and its distance from the critical statistic, for the probability a that data randomly takes a given value.
• GrubbsClassic(v, a)—Returns the index of the data point which is the most likely to be an outlier, and its test statistic, and its distance from the critical statistic, for the probability a that data randomly takes a given value.
• ThreeSigma(v)—Returns indices of points in v, which have a test statistic greater than three, and the value of this quantity for each point.
• trim(v, vindex)—Trims out the entries (rows) specified by vindex.
The test statistic used to detect outliers is the distance of a point to the mean of the data set, divided by the standard deviation.
When a real matrix is used in place of a vector, the functions that detect outliers return the pair of indices for each outlier candidate as nested matrices.
Arguments
• v is a real vector or matrix representing data points.
• a is a probability between 0 < a < 1.
• vindex is an integer-valued vector. The indices specified in vindex are relative to ORIGIN.