Functions > Data Analysis > Principal Component Analysis > Example: Principal Component Analysis 1
  
Example: Principal Component Analysis 1
Use the Nipals, loadings, scores, PCAeigenvals and PCAvariance functions to perform Principal Component Analysis (PCA).
The Nipals Function
1. Define a data set where each column stands for one variable.
Click to copy this expression
Click to copy this expression
2. Plot the data set.
Click to copy this expression
In this graph, the x-y plane and the x-z plane are superimposed to expose the trend in the data. The data is in fact an elliptical cloud of points that almost lies on a plane. The three variables are linearly related, and the deviation from a perfect plane is due to noise.
3. Use the rows and cols functions to define the row and column indices.
Click to copy this expression
Click to copy this expression
4. Use the mean function to find the mean of the data and then subtract it from each variable in order to center the data.
Click to copy this expression
Click to copy this expression
5. Plot the centered data.
Click to copy this expression
* 
The data is now centered about the origin. This is one of the steps that the Nipals function carries out automatically.
In many applications of PCA, it is also desirable to scale the data so that the variables have equal weights, for example when different variables have different units. Scaling each variable (each column of Data) to unit variance is common, but not appropriate for this data, so no scaling is used here.
6. Use the Nipals function to create a new variable space. Use three principal components, which is the maximum possible since there were only three variables to start with.
Click to copy this expression
Click to copy this expression
Click to copy this expression
Click to copy this expression
The output of the Nipals function is a nested matrix of 6 individual matrices. Use it to find the loadings, scores, eigenvalues and eigenvectors of the data. If needed, use the Nipals2 function and the last two matrices to extract additional components.
Loadings and Scores
1. Call the loadings function to retrieve the data found in the second matrix of NIPALS_Result.
Click to copy this expression
Click to copy this expression
Click to copy this expression
Each column of LOADINGS is a loading vector.
2. Call the scores function to retrieve the data found in the first matrix of NIPALS_Result.
Click to copy this expression
Click to copy this expression
Click to copy this expression
The scores represent the proportions in which the loading vectors are added to recreate the original spectra. Think of them as intensities. Data = LOADINGS * SCOREST.
3. Plot the data stored in the SCORES matrix.
Click to copy this expression
The data has been rotated so that the maximum amount of variance can be explained by the first variable. This is represented on the graph by the long axis of the elliptical cloud, which is now parallel to the x-axis. The values for the third variable, parallel to the z-axis, are very small. For most purposes, this variable can be discarded. You have compressed the data.
PCA Variance and Eigenvalues
1. Use the PCAvariance function to return the cumulated variances of the three principal components.
Click to copy this expression
Click to copy this expression
The first two components make up 99.9% of the variance in the system.
2. Use the PCAeigenvalue function to extract the eigenvalues of the principal components.
Click to copy this expression
Click to copy this expression