1. Define a data set where each column stands for one variable.
2. Plot the data set.
In this graph, the x-y plane and the x-z plane are superimposed to expose the trend in the data. The data is in fact an elliptical cloud of points that almost lies on a plane. The three variables are linearly related, and the deviation from a perfect plane is due to noise.
3. Use the rows and cols functions to define the row and column indices.
4. Use the mean function to find the mean of the data and then subtract it from each variable in order to center the data.
5. Plot the centered data.
• The data is now centered about the origin. This is one of the steps that the Nipals function carries out automatically.
• In many applications of PCA, it is also desirable to scale the data so that the variables have equal weights, for example when different variables have different units. Scaling each variable (each column of Data) to unit variance is common, but not appropriate for this data, so no scaling is used here.
6. Use the Nipals function to create a new variable space. Use three principal components, which is the maximum possible since there were only three variables to start with.
The output of the Nipals function is a nested matrix of 6 individual matrices. Use it to find the loadings, scores, eigenvals and eigenvecs of the data. If needed, use the Nipals2 function and the last two matrices to extract additional components.
Loadings and Scores
1. Call the loadings function to retrieve the data found in the second matrix of NIPALS_Result.
Each column of LOADINGS is a loading vector.
2. Call the scores function to retrieve the data found in the first matrix of NIPALS_Result.
The scores represent the proportions in which the loading vectors are added to recreate the original spectra. Think of them as intensities. Data = LOADINGS * SCOREST.
3. Plot the data stored in the SCORES matrix.
The data has been rotated so that the maximum amount of variance can be explained by the first variable. This is represented on the graph by the long axis of the elliptical cloud, which is now parallel to the x-axis. The values for the third variable, parallel to the z-axis, are very small. For most purposes, this variable can be discarded. You have compressed the data.
PCA Variance and Eigenvalues
1. Use the PCAvariance function to return the cumulated variances of the three principal components.
The first two components make up 99.9% of the variance in the system.
2. Use the PCAeigenvals function to extract the eigenvalues of the principal components.