Goodness of Fit
When data points cluster around a straight line, the selected distribution is good; however, the goodness of fit cannot be gauged easily when the samples are very small. Although there are several complex statistical measures for determining the most appropriate distribution for a set of data, Table 7-8 describes the simple measures that are generally used to evaluate Weibull probability plots.
Table 7-8. Goodness of Fit Measures
Measure
Description
Correlation Coefficient (r)
Measures the strength of a linear relationship between two variables. The correlation coefficient is always a number between -1 and +1, depending on the slope. Because Weibull probability plots always have positive slopes, they will always have positive correlation coefficients. The closer r is to 1, the better the fit.
Correlation Coefficient Squared (r2)
Measures the proportion of the variation in the data that is explained by the fit to the distribution. For example, if r2 equals 0.93, it implies that 93 percent of the variation in the data is explained by the fit. The r2 is also known as the coefficient of determination.
Critical Correlation Coefficient (CCC)
Measures the distribution of the correlation coefficient from ideal Weibull probability plots based upon simulations of median rank plotting positions. The 90 percent CCC is then compared to the correlation coefficient. If r is greater than the CCC, the fit is good fit. If r is smaller than the CCC, the data is significantly different from a Weibull distribution, and the fit is bad. CCC is considered the best statistical practice for determining how well the distribution fits the data set.
Critical Correlation Coefficient Squared (CCC2)
Measures the proportion of variation for the regression fit method. A good fit occurs when r2 is greater than or equal to CCC2.
To compare the fit of one distribution with another, you generally need to have 20 or more data points in the sample, and you must know the P value for the correlation coefficient (r) for each distribution. The distribution with the highest P value is the best statistical choice.