Example: Principal Component Analysis 2

Functions > Data Analysis > Principal Component Analysis > Example: Principal Component Analysis 2

Use the Nipals and Nipals2 functions to analyze complex data such as near-infrared (NIR) spectra of pharmaceutical tablets with five different dosages of the active ingredients (the data is courtesy of Bruker Optics, Inc.). A model can be created to distinguish each dosage based on the spectrum, even though the actual dosages are not known. This model can be used for Quality Control purposes when producing further tablets.

1. Define the following data set:

This data set describes a double blind study in a clinical trial.

The first column is the wavenumber (1/wavelength), in cm-1. There are five sequential spectra of each dosage, constituting the remaining 25 columns.

2. Use the submatrix, cols, and rows functions to extract the 25 spectra.

<region id="ID0EDV3U" actualWidth="413.8900000000001" actualHeight="25.6" top="441.60000000000008" left="38.400000000000006" xmlns="http://schemas.mathsoft.com/worksheet50">
  <math resultRef="282">
    <define xmlns="http://schemas.mathsoft.com/math50">
      <id labels="VARIABLE" xml:space="preserve">S</id>
      <parens>
        <apply>
          <id labels="FUNCTION" xml:space="preserve">submatrix</id>
          <sequence>
            <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Data</id>
            <real>1</real>
            <apply>
              <minus />
              <apply>
                <id labels="FUNCTION" xml:space="preserve" label-is-contextual="true">rows</id>
                <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Data</id>
              </apply>
              <real>1</real>
            </apply>
            <real>1</real>
            <apply>
              <minus />
              <apply>
                <id labels="FUNCTION" xml:space="preserve" label-is-contextual="true">cols</id>
                <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Data</id>
              </apply>
              <real>1</real>
            </apply>
          </sequence>
        </apply>
      </parens>
    </define>
  </math>
</region>

3. Use the max and match functions to find the maximum value and the spectrum that contains it. Since the data values are very small, set TOL to an even smaller value.

The maximum value is in row 210 and column 17 of the Data matrix.

4. Plot the first two spectra of each dosage, which totals 10 data sets. The data set pairs are columns [1,2], [6,7], [11,12], [16,17], and [21,22] of the Data matrix.

◦ To get a reasonable scale for the horizontal axis, divide the wavenumbers by 1000. Similarly, since the spectra values are small, we multiply them by 1000.

◦ Use a horizontal marker to show the maximum spectra value.

<region id="ID0EL43U" actualWidth="576.20000000000039" actualHeight="597" top="864.00000000000011" left="38.400000000000006" height="597" width="576.20000000000039" xmlns="http://schemas.mathsoft.com/worksheet50">
  <plot background-type="white">
    <xyPlot>
      <title class="- topic/title " wwtype:type="Paragraph" xmlns:wwtype="urn:WebWorks-Type-Schema" />
      <legend />
      <traces>
        <trace resultRef="290">
          <traceStyle color="#FFED1D2F" symbol="none" line-weight="1" line-style="Solid">lines</traceStyle>
        </trace>
        <trace resultRef="291">
          <traceStyle color="#FFED1D2F" symbol="none" line-weight="1" line-style="Solid">lines</traceStyle>
        </trace>
        <trace resultRef="292">
          <traceStyle color="#FF068149" symbol="none" line-weight="1" line-style="Solid">lines</traceStyle>
        </trace>
        <trace resultRef="293">
          <traceStyle color="#FF068149" symbol="none" line-weight="1" line-style="Solid">lines</traceStyle>
        </trace>
        <trace resultRef="294">
          <traceStyle color="#FF2E3192" symbol="none" line-weight="1" line-style="Solid">lines</traceStyle>
        </trace>
        <trace resultRef="295">
          <traceStyle color="#FF2E3192" symbol="none" line-weight="1" line-style="Solid">lines</traceStyle>
        </trace>
        <trace resultRef="296">
          <traceStyle color="#FF662D91" symbol="none" line-weight="1" line-style="Solid">lines</traceStyle>
        </trace>
        <trace resultRef="297">
          <traceStyle color="#FF662D91" symbol="none" line-weight="1" line-style="Solid">lines</traceStyle>
        </trace>
        <trace resultRef="298">
          <traceStyle color="#FF000000" symbol="none" line-weight="1" line-style="Solid">lines</traceStyle>
        </trace>
        <trace resultRef="299">
          <traceStyle color="#FF000000" symbol="none" line-weight="1" line-style="Solid">lines</traceStyle>
        </trace>
      </traces>
      <graph-size width="403.4" height="481.8" />
      <axes>
        <xAxis rank="1" legend-position="PlotBoundaryBottom" start="-5.25" end="-4">
          <axisLine position="ticknumberlock" positionticmark="0" legendWidth="145.493333333333" />
          <axisGrid>
            <gridFrequency>6</gridFrequency>
            <gridLabels display="true" />
            <gridLines />
            <tickMarks display="true" />
          </axisGrid>
          <axisLabel />
          <markers />
          <numberFormat>
            <general precision="3" show-trailing-zeros="false" radix="dec" zero-threshold="15" imaginary-value="i" exponential-threshold="3" />
          </numberFormat>
          <plotEquations>
            <plotEquation>
              <math resultRef="306">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <mult />
                  <apply>
                    <neg />
                    <apply>
                      <matcol />
                      <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Data</id>
                      <real>0</real>
                    </apply>
                  </apply>
                  <apply>
                    <pow />
                    <real>10</real>
                    <real>-3</real>
                  </apply>
                </apply>
              </math>
              <math resultRef="307">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
          </plotEquations>
          <xyDomain scale-type="linear" auto-scale="true">
            <startValue resultRef="301">
              <real xmlns="http://schemas.mathsoft.com/math50">-5.25</real>
            </startValue>
            <secondTickValue resultRef="303">
              <real xmlns="http://schemas.mathsoft.com/math50">-5</real>
            </secondTickValue>
            <endValue resultRef="305">
              <real xmlns="http://schemas.mathsoft.com/math50">-4</real>
            </endValue>
          </xyDomain>
        </xAxis>
        <yAxis rank="1" legend-position="PlotBoundaryLeft" start="0" end="8">
          <axisLine position="ticknumberlock" positionticmark="0" legendWidth="135.456666666667" />
          <axisGrid>
            <gridFrequency>9</gridFrequency>
            <gridLabels display="true" />
            <gridLines />
            <tickMarks display="true" />
          </axisGrid>
          <axisLabel />
          <markers>
            <marker resultRef="308" display="true" color="#FF662D91" line-style="Dash" value="7.8">
              <apply xmlns="http://schemas.mathsoft.com/math50">
                <mult />
                <id xml:space="preserve">M</id>
                <apply>
                  <pow />
                  <real>10</real>
                  <real>3</real>
                </apply>
              </apply>
            </marker>
          </markers>
          <numberFormat>
            <engineering use-e-notation="false" precision="3" show-trailing-zeros="false" radix="dec" imaginary-value="i" />
          </numberFormat>
          <plotEquations>
            <plotEquation>
              <math resultRef="316">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <mult />
                  <apply>
                    <matcol />
                    <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Data</id>
                    <real>1</real>
                  </apply>
                  <apply>
                    <pow />
                    <real>10</real>
                    <real>3</real>
                  </apply>
                </apply>
              </math>
              <math resultRef="317">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="318">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <mult />
                  <apply>
                    <matcol />
                    <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Data</id>
                    <real>2</real>
                  </apply>
                  <apply>
                    <pow />
                    <real>10</real>
                    <real>3</real>
                  </apply>
                </apply>
              </math>
              <math resultRef="319">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="320">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <mult />
                  <apply>
                    <matcol />
                    <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Data</id>
                    <real>6</real>
                  </apply>
                  <apply>
                    <pow />
                    <real>10</real>
                    <real>3</real>
                  </apply>
                </apply>
              </math>
              <math resultRef="321">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="322">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <mult />
                  <apply>
                    <matcol />
                    <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Data</id>
                    <real>7</real>
                  </apply>
                  <apply>
                    <pow />
                    <real>10</real>
                    <real>3</real>
                  </apply>
                </apply>
              </math>
              <math resultRef="323">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="324">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <mult />
                  <apply>
                    <matcol />
                    <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Data</id>
                    <real>11</real>
                  </apply>
                  <apply>
                    <pow />
                    <real>10</real>
                    <real>3</real>
                  </apply>
                </apply>
              </math>
              <math resultRef="325">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="326">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <mult />
                  <apply>
                    <matcol />
                    <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Data</id>
                    <real>12</real>
                  </apply>
                  <apply>
                    <pow />
                    <real>10</real>
                    <real>3</real>
                  </apply>
                </apply>
              </math>
              <math resultRef="327">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="328">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <mult />
                  <apply>
                    <matcol />
                    <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Data</id>
                    <real>16</real>
                  </apply>
                  <apply>
                    <pow />
                    <real>10</real>
                    <real>3</real>
                  </apply>
                </apply>
              </math>
              <math resultRef="329">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="330">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <mult />
                  <apply>
                    <matcol />
                    <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Data</id>
                    <real>17</real>
                  </apply>
                  <apply>
                    <pow />
                    <real>10</real>
                    <real>3</real>
                  </apply>
                </apply>
              </math>
              <math resultRef="331">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="332">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <mult />
                  <apply>
                    <matcol />
                    <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Data</id>
                    <real>21</real>
                  </apply>
                  <apply>
                    <pow />
                    <real>10</real>
                    <real>3</real>
                  </apply>
                </apply>
              </math>
              <math resultRef="333">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="334">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <mult />
                  <apply>
                    <matcol />
                    <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Data</id>
                    <real>22</real>
                  </apply>
                  <apply>
                    <pow />
                    <real>10</real>
                    <real>3</real>
                  </apply>
                </apply>
              </math>
              <math resultRef="335">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
          </plotEquations>
          <xyDomain scale-type="linear" auto-scale="true">
            <startValue resultRef="311">
              <real xmlns="http://schemas.mathsoft.com/math50">0</real>
            </startValue>
            <secondTickValue resultRef="313">
              <real xmlns="http://schemas.mathsoft.com/math50">1</real>
            </secondTickValue>
            <endValue resultRef="315">
              <real xmlns="http://schemas.mathsoft.com/math50">8</real>
            </endValue>
          </xyDomain>
        </yAxis>
      </axes>
    </xyPlot>
  </plot>
</region>

◦ By convention, wavenumbers are plotted in decreasing order. Data<0> is therefore negated to show the wavenumbers in the right order.

◦ No part of the spectrum can be used to easily distinguish one dosage from another: they all have the same basic form and close absorbance values.

◦ Most of the data are redundant. There are 236 points in each spectrum, which means 236 measured variables (absorbance for a particular wavelength of light), but the variation of these points is clearly interrelated.

5. Split the Data matrix into two data sets: the wavenumbers (column 0) and the spectra for each tablet (submatrix S). To match the common convention, transpose the spectra for each tablet so that each column corresponds to an independent variable.

6. Define the number of Principal Components, as well as the maximum number of iterations, before applying the Nipals function to the data. The Nipals function centers the data, subtracting the mean spectrum from each row.

<region id="ID0ELG4U" actualWidth="475.42999999999995" actualHeight="25.6" top="1843.2000000000003" left="38.400000000000006" xmlns="http://schemas.mathsoft.com/worksheet50">
  <math resultRef="375">
    <define xmlns="http://schemas.mathsoft.com/math50">
      <id labels="*" xml:space="preserve">NIPALS1</id>
      <apply>
        <id labels="FUNCTION" xml:space="preserve" label-is-contextual="true">Nipals</id>
        <sequence>
          <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Spectra</id>
          <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">NumPC</id>
          <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">MaxIteration</id>
          <str xml:space="preserve">noscale</str>
          <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Acc</id>
        </sequence>
      </apply>
    </define>
    <resultFormat>
      <matrix size="12,12" offset="0,0" show-indices="false" expand-nested-arrays="false" />
    </resultFormat>
  </math>
</region>

7. Choose a spectrum to reconstruct.

8. Extract the scores and the loadings from the output of the Nipals function.

9. Use the mean function to calculate the mean spectrum.

10. Estimate of the original spectrum by multiplying the matrix of loading vectors with the matrix of scores, and then adding the mean spectrum.

<region id="ID0ELN4U" actualWidth="364.69333333333338" actualHeight="36.152" top="2169.6000000000004" left="38.400000000000006" xmlns="http://schemas.mathsoft.com/worksheet50">
  <math resultRef="387">
    <define xmlns="http://schemas.mathsoft.com/math50">
      <id labels="VARIABLE" xml:space="preserve">Model1</id>
      <apply>
        <plus />
        <apply>
          <matcol />
          <parens>
            <apply>
              <mult />
              <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Loadings</id>
              <apply>
                <transpose />
                <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Scores</id>
              </apply>
            </apply>
          </parens>
          <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">index</id>
        </apply>
        <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">MeanSpectrum</id>
      </apply>
    </define>
    <resultFormat>
      <matrix size="12,12" offset="0,0" show-indices="false" expand-nested-arrays="false" />
    </resultFormat>
  </math>
</region>

11. Plot the original and the reconstructed spectrums. Scale the horizontal and vertical axes to get reasonable values.

<region id="ID0EJP4U" actualWidth="576.14074074074142" actualHeight="312" top="2284.8" left="38.400000000000006" height="312" width="576.14074074074142" xmlns="http://schemas.mathsoft.com/worksheet50">
  <plot background-type="white">
    <xyPlot>
      <title class="- topic/title " wwtype:type="Paragraph" xmlns:wwtype="urn:WebWorks-Type-Schema" />
      <legend />
      <traces>
        <trace resultRef="391">
          <traceStyle color="#FF00008B" symbol="x" line-weight="1" line-style="None">lines</traceStyle>
        </trace>
        <trace resultRef="392">
          <traceStyle color="#FFFF0000" symbol="none" line-weight="1" line-style="Solid">lines</traceStyle>
        </trace>
      </traces>
      <graph-size width="403.340740740741" height="235.2" />
      <axes>
        <xAxis rank="1" legend-position="PlotBoundaryBottom" start="-5.25" end="-4">
          <axisLine position="bottom" positionticmark="-1" legendWidth="145.493333333333" />
          <axisGrid>
            <gridFrequency>6</gridFrequency>
            <gridLabels display="true" />
            <gridLines />
            <tickMarks display="true" />
          </axisGrid>
          <axisLabel />
          <markers />
          <numberFormat>
            <general precision="3" show-trailing-zeros="false" radix="dec" zero-threshold="15" imaginary-value="i" exponential-threshold="3" />
          </numberFormat>
          <plotEquations>
            <plotEquation>
              <math resultRef="399">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <mult />
                  <apply>
                    <neg />
                    <apply>
                      <matcol />
                      <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Data</id>
                      <real>0</real>
                    </apply>
                  </apply>
                  <apply>
                    <pow />
                    <real>10</real>
                    <real>-3</real>
                  </apply>
                </apply>
              </math>
              <math resultRef="400">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
          </plotEquations>
          <xyDomain scale-type="linear" auto-scale="true">
            <startValue resultRef="394">
              <real xmlns="http://schemas.mathsoft.com/math50">-5.25</real>
            </startValue>
            <secondTickValue resultRef="396">
              <real xmlns="http://schemas.mathsoft.com/math50">-5</real>
            </secondTickValue>
            <endValue resultRef="398">
              <real xmlns="http://schemas.mathsoft.com/math50">-4.0</real>
            </endValue>
          </xyDomain>
        </xAxis>
        <yAxis rank="1" legend-position="PlotBoundaryLeft" start="0" end="8">
          <axisLine position="ticknumberlock" positionticmark="0" legendWidth="144.696666666667" />
          <axisGrid>
            <gridFrequency>5</gridFrequency>
            <gridLabels display="true" />
            <gridLines />
            <tickMarks display="true" />
          </axisGrid>
          <axisLabel />
          <markers />
          <numberFormat>
            <scientific use-e-notation="false" precision="3" show-trailing-zeros="false" radix="dec" imaginary-value="i" />
          </numberFormat>
          <plotEquations>
            <plotEquation>
              <math resultRef="405">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <mult />
                  <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Original</id>
                  <apply>
                    <pow />
                    <real>10</real>
                    <real>3</real>
                  </apply>
                </apply>
              </math>
              <math resultRef="406">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="407">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <mult />
                  <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Model1</id>
                  <apply>
                    <pow />
                    <real>10</real>
                    <real>3</real>
                  </apply>
                </apply>
              </math>
              <math resultRef="408">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
          </plotEquations>
          <xyDomain scale-type="linear" auto-scale="true">
            <startValue>
              <placeholder xmlns="http://schemas.mathsoft.com/math50" />
            </startValue>
            <secondTickValue resultRef="402">
              <real xmlns="http://schemas.mathsoft.com/math50">2</real>
            </secondTickValue>
            <endValue resultRef="404">
              <real xmlns="http://schemas.mathsoft.com/math50">8</real>
            </endValue>
          </xyDomain>
        </yAxis>
      </axes>
    </xyPlot>
  </plot>
</region>

All the spectra are well represented using only two principal components for the principal component analysis.

12. Rearrange the scores into two matrices. Each column of the matrices represents the scores for one of the five tablet dosages.

<region id="ID0EZR4U" actualWidth="185.36" actualHeight="38.952000000000005" top="2784.0000000000005" left="38.400000000000006" xmlns="http://schemas.mathsoft.com/worksheet50">
  <math resultRef="424">
    <define xmlns="http://schemas.mathsoft.com/math50">
      <apply>
        <indexer />
        <id labels="*" xml:space="preserve">Xdata</id>
        <sequence>
          <id labels="*" xml:space="preserve">p</id>
          <id labels="*" xml:space="preserve">q</id>
        </sequence>
      </apply>
      <apply>
        <indexer />
        <parens>
          <apply>
            <matcol />
            <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Scores</id>
            <real>0</real>
          </apply>
        </parens>
        <apply>
          <plus />
          <id xml:space="preserve">p</id>
          <apply>
            <mult />
            <id xml:space="preserve">q</id>
            <real>5</real>
          </apply>
        </apply>
      </apply>
    </define>
    <resultFormat>
      <matrix size="12,12" offset="0,0" show-indices="false" expand-nested-arrays="false" />
    </resultFormat>
  </math>
</region>

<region id="ID0EHS4U" actualWidth="184.72666666666669" actualHeight="38.952000000000005" top="2822.4000000000005" left="38.400000000000006" xmlns="http://schemas.mathsoft.com/worksheet50">
  <math resultRef="426">
    <define xmlns="http://schemas.mathsoft.com/math50">
      <apply>
        <indexer />
        <id labels="*" xml:space="preserve">Ydata</id>
        <sequence>
          <id labels="*" xml:space="preserve">p</id>
          <id labels="*" xml:space="preserve">q</id>
        </sequence>
      </apply>
      <apply>
        <indexer />
        <parens>
          <apply>
            <matcol />
            <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Scores</id>
            <real>1</real>
          </apply>
        </parens>
        <apply>
          <plus />
          <id xml:space="preserve">p</id>
          <apply>
            <mult />
            <id xml:space="preserve">q</id>
            <real>5</real>
          </apply>
        </apply>
      </apply>
    </define>
    <resultFormat>
      <matrix size="12,12" offset="0,0" show-indices="false" expand-nested-arrays="false" />
    </resultFormat>
  </math>
</region>

13. Plot the scores of the first factor against the scores of the second factor. Each dosage is shown in a different color.

<region id="ID0EXT4U" actualWidth="557.39259259259256" actualHeight="420.474074074074" top="2899.2000000000003" left="57.600000000000009" height="420.474074074074" width="557.39259259259234" xmlns="http://schemas.mathsoft.com/worksheet50">
  <plot background-type="white" origin-positioning="true">
    <xyPlot>
      <title class="- topic/title " wwtype:type="Paragraph" xmlns:wwtype="urn:WebWorks-Type-Schema" />
      <legend />
      <traces>
        <trace resultRef="428">
          <traceStyle color="#FF00008B" symbol="filledStar" line-weight="2" line-style="None">lines</traceStyle>
        </trace>
        <trace resultRef="429">
          <traceStyle color="#FF662D91" symbol="filledRectangle" line-weight="2" line-style="None">lines</traceStyle>
        </trace>
        <trace resultRef="430">
          <traceStyle color="#FFFF0000" symbol="filledRhombus" line-weight="2" line-style="None">lines</traceStyle>
        </trace>
        <trace resultRef="431">
          <traceStyle color="#FF068149" symbol="filledStar" line-weight="2" line-style="None">lines</traceStyle>
        </trace>
        <trace resultRef="432">
          <traceStyle color="#FF31ADC2" symbol="point" line-weight="2" line-style="None">lines</traceStyle>
        </trace>
      </traces>
      <graph-size width="422.992592592592" height="324.474074074074" />
      <axes>
        <xAxis rank="1" legend-position="PlotBoundaryBottom" start="-0.004" end="0.004">
          <axisLine position="origin" positionticmark="3" legendWidth="290.56" />
          <axisGrid>
            <gridFrequency>9</gridFrequency>
            <gridLabels display="true" />
            <gridLines />
            <tickMarks display="true" />
          </axisGrid>
          <axisLabel />
          <markers />
          <plotEquations>
            <plotEquation>
              <math resultRef="439">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <matcol />
                  <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Xdata</id>
                  <real>0</real>
                </apply>
              </math>
              <math resultRef="440">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="441">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <matcol />
                  <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Xdata</id>
                  <real>1</real>
                </apply>
              </math>
              <math resultRef="442">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="443">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <matcol />
                  <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Xdata</id>
                  <real>2</real>
                </apply>
              </math>
              <math resultRef="444">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="445">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <matcol />
                  <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Xdata</id>
                  <real>3</real>
                </apply>
              </math>
              <math resultRef="446">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="447">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <matcol />
                  <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Xdata</id>
                  <real>4</real>
                </apply>
              </math>
              <math resultRef="448">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
          </plotEquations>
          <xyDomain scale-type="linear" auto-scale="true">
            <startValue resultRef="434">
              <real xmlns="http://schemas.mathsoft.com/math50">-0.004</real>
            </startValue>
            <secondTickValue resultRef="436">
              <real xmlns="http://schemas.mathsoft.com/math50">-0.003</real>
            </secondTickValue>
            <endValue resultRef="438">
              <real xmlns="http://schemas.mathsoft.com/math50">0.004</real>
            </endValue>
          </xyDomain>
        </xAxis>
        <yAxis rank="1" legend-position="PlotBoundaryLeft" start="-0.003" end="0.003">
          <axisLine position="origin" positionticmark="4" legendWidth="104.886666666667" />
          <axisGrid>
            <gridFrequency>7</gridFrequency>
            <gridLabels display="true" />
            <gridLines />
            <tickMarks display="true" />
          </axisGrid>
          <axisLabel />
          <markers />
          <plotEquations>
            <plotEquation>
              <math resultRef="455">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <matcol />
                  <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Ydata</id>
                  <real>0</real>
                </apply>
              </math>
              <math resultRef="456">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="457">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <matcol />
                  <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Ydata</id>
                  <real>1</real>
                </apply>
              </math>
              <math resultRef="458">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="459">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <matcol />
                  <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Ydata</id>
                  <real>2</real>
                </apply>
              </math>
              <math resultRef="460">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="461">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <matcol />
                  <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Ydata</id>
                  <real>3</real>
                </apply>
              </math>
              <math resultRef="462">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="463">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <matcol />
                  <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Ydata</id>
                  <real>4</real>
                </apply>
              </math>
              <math resultRef="464">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
          </plotEquations>
          <xyDomain scale-type="linear" auto-scale="true">
            <startValue resultRef="450">
              <real xmlns="http://schemas.mathsoft.com/math50">-0.003</real>
            </startValue>
            <secondTickValue resultRef="452">
              <real xmlns="http://schemas.mathsoft.com/math50">-0.002</real>
            </secondTickValue>
            <endValue resultRef="454">
              <real xmlns="http://schemas.mathsoft.com/math50">0.003</real>
            </endValue>
          </xyDomain>
        </yAxis>
      </axes>
    </xyPlot>
  </plot>
</region>

Some grouping of the data is evident, but it is still hard to distinguish one dosage from another. Adding a third score to the plot might help.

14. Use the Nipals2 function to add four principal components to the model created with two principal components.

The output matrix of NIPALS2 has the same form as that of NIPALS, but with additional columns and rows corresponding to the additional principal components. The number of scores and loadings has now increased to 6.

15. Extract the loadings and the scores from the NIPALS2 matrix to create a new model of the chosen spectrum.

<region id="ID0E2Y4U" actualWidth="276.41333333333341" actualHeight="36.152" top="3705.6000000000004" left="38.400000000000006" xmlns="http://schemas.mathsoft.com/worksheet50">
  <math resultRef="499">
    <define xmlns="http://schemas.mathsoft.com/math50">
      <id labels="VARIABLE" xml:space="preserve">Model2</id>
      <apply>
        <plus />
        <apply>
          <matcol />
          <parens>
            <apply>
              <mult />
              <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">L</id>
              <apply>
                <transpose />
                <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">S</id>
              </apply>
            </apply>
          </parens>
          <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">index</id>
        </apply>
        <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">MeanSpectrum</id>
      </apply>
    </define>
    <resultFormat>
      <matrix size="12,12" offset="0,0" show-indices="false" expand-nested-arrays="false" />
    </resultFormat>
  </math>
</region>

16. Plot and compare the two models for the chosen spectrum. Scale the horizontal and vertical axis to get reasonable values.

<region id="ID0EL14U" actualWidth="577.10370370370379" actualHeight="312" top="3782.4000000000005" left="38.400000000000006" height="312" width="577.10370370370379" xmlns="http://schemas.mathsoft.com/worksheet50">
  <plot background-type="white">
    <xyPlot>
      <title class="- topic/title " wwtype:type="Paragraph" xmlns:wwtype="urn:WebWorks-Type-Schema" />
      <legend />
      <traces>
        <trace resultRef="501">
          <traceStyle color="#FF000000" symbol="x" line-weight="1" line-style="None">lines</traceStyle>
        </trace>
        <trace resultRef="502">
          <traceStyle color="#FFFF0000" symbol="none" line-weight="1" line-style="Solid">lines</traceStyle>
        </trace>
      </traces>
      <graph-size width="404.303703703704" height="235.2" />
      <axes>
        <xAxis rank="1" legend-position="PlotBoundaryBottom" start="-5.25" end="-4">
          <axisLine position="ticknumberlock" positionticmark="0" legendWidth="145.493333333333" />
          <axisGrid>
            <gridFrequency>6</gridFrequency>
            <gridLabels display="true" />
            <gridLines />
            <tickMarks display="true" />
          </axisGrid>
          <axisLabel />
          <markers />
          <numberFormat>
            <general precision="3" show-trailing-zeros="false" radix="dec" zero-threshold="15" imaginary-value="i" exponential-threshold="3" />
          </numberFormat>
          <plotEquations>
            <plotEquation>
              <math resultRef="509">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <mult />
                  <apply>
                    <neg />
                    <apply>
                      <matcol />
                      <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Data</id>
                      <real>0</real>
                    </apply>
                  </apply>
                  <apply>
                    <pow />
                    <real>10</real>
                    <real>-3</real>
                  </apply>
                </apply>
              </math>
              <math resultRef="510">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
          </plotEquations>
          <xyDomain scale-type="linear" auto-scale="true">
            <startValue resultRef="504">
              <real xmlns="http://schemas.mathsoft.com/math50">-5.25</real>
            </startValue>
            <secondTickValue resultRef="506">
              <real xmlns="http://schemas.mathsoft.com/math50">-5</real>
            </secondTickValue>
            <endValue resultRef="508">
              <real xmlns="http://schemas.mathsoft.com/math50">-4</real>
            </endValue>
          </xyDomain>
        </xAxis>
        <yAxis rank="1" legend-position="PlotBoundaryLeft" start="0" end="8">
          <axisLine position="ticknumberlock" positionticmark="0" legendWidth="136.85" />
          <axisGrid>
            <gridFrequency>5</gridFrequency>
            <gridLabels display="true" />
            <gridLines />
            <tickMarks display="true" />
          </axisGrid>
          <axisLabel />
          <markers />
          <numberFormat>
            <scientific use-e-notation="false" precision="3" show-trailing-zeros="false" radix="dec" imaginary-value="i" />
          </numberFormat>
          <plotEquations>
            <plotEquation>
              <math resultRef="517">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <mult />
                  <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Model1</id>
                  <apply>
                    <pow />
                    <real>10</real>
                    <real>3</real>
                  </apply>
                </apply>
              </math>
              <math resultRef="518">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
            <plotEquation>
              <math resultRef="519">
                <apply xmlns="http://schemas.mathsoft.com/math50">
                  <mult />
                  <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true">Model2</id>
                  <apply>
                    <pow />
                    <real>10</real>
                    <real>3</real>
                  </apply>
                </apply>
              </math>
              <math resultRef="520">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
          </plotEquations>
          <xyDomain scale-type="linear" auto-scale="true">
            <startValue resultRef="512">
              <real xmlns="http://schemas.mathsoft.com/math50">0</real>
            </startValue>
            <secondTickValue resultRef="514">
              <real xmlns="http://schemas.mathsoft.com/math50">2</real>
            </secondTickValue>
            <endValue resultRef="516">
              <real xmlns="http://schemas.mathsoft.com/math50">8</real>
            </endValue>
          </xyDomain>
        </yAxis>
      </axes>
    </xyPlot>
  </plot>
</region>

17. Extract the cumulative variance from NIPALS2.

18. Plot the cumulative variance against the number of principal components.

<region id="ID0EL54U" actualWidth="575.2" actualHeight="312" top="4320.0000000000009" left="38.400000000000006" height="312" width="575.2" xmlns="http://schemas.mathsoft.com/worksheet50">
  <plot background-type="white" origin-positioning="true">
    <xyPlot>
      <title class="- topic/title " wwtype:type="Paragraph" xmlns:wwtype="urn:WebWorks-Type-Schema" />
      <legend />
      <traces>
        <trace resultRef="539">
          <traceStyle color="#FFED1D2F" symbol="filledRectangle" line-weight="1" line-style="Solid">lines</traceStyle>
        </trace>
      </traces>
      <graph-size width="421.6" height="254.4" />
      <axes>
        <xAxis rank="1" legend-position="PlotBoundaryBottom" start="0" end="5">
          <axisLine position="origin" positionticmark="0" legendWidth="93.8" />
          <axisGrid>
            <gridFrequency>6</gridFrequency>
            <gridLabels display="true" />
            <gridLines />
            <tickMarks display="true" />
          </axisGrid>
          <axisLabel />
          <markers />
          <numberFormat>
            <general precision="3" show-trailing-zeros="false" radix="dec" zero-threshold="15" imaginary-value="i" exponential-threshold="3" />
          </numberFormat>
          <plotEquations>
            <plotEquation>
              <math resultRef="542">
                <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true" xmlns="http://schemas.mathsoft.com/math50">N_PC</id>
              </math>
              <math resultRef="543">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
          </plotEquations>
          <xyDomain scale-type="linear" auto-scale="true">
            <startValue>
              <placeholder xmlns="http://schemas.mathsoft.com/math50" />
            </startValue>
            <secondTickValue resultRef="541">
              <real xmlns="http://schemas.mathsoft.com/math50">1</real>
            </secondTickValue>
            <endValue>
              <placeholder xmlns="http://schemas.mathsoft.com/math50" />
            </endValue>
          </xyDomain>
        </xAxis>
        <yAxis rank="1" legend-position="PlotBoundaryLeft" start="0" end="1.5">
          <axisLine position="origin" positionticmark="0" legendWidth="112.47" />
          <axisGrid>
            <gridFrequency>4</gridFrequency>
            <gridLabels display="true" />
            <gridLines />
            <tickMarks display="true" />
          </axisGrid>
          <axisLabel />
          <markers />
          <numberFormat>
            <general precision="3" show-trailing-zeros="false" radix="dec" zero-threshold="15" imaginary-value="i" exponential-threshold="3" />
          </numberFormat>
          <plotEquations>
            <plotEquation>
              <math resultRef="550">
                <id labels="VARIABLE" xml:space="preserve" label-is-contextual="true" xmlns="http://schemas.mathsoft.com/math50">CumVar</id>
              </math>
              <math resultRef="551">
                <placeholder xmlns="http://schemas.mathsoft.com/math50" />
              </math>
            </plotEquation>
          </plotEquations>
          <xyDomain scale-type="linear" auto-scale="true">
            <startValue resultRef="545">
              <real xmlns="http://schemas.mathsoft.com/math50">0</real>
            </startValue>
            <secondTickValue resultRef="547">
              <real xmlns="http://schemas.mathsoft.com/math50">0.5</real>
            </secondTickValue>
            <endValue resultRef="549">
              <real xmlns="http://schemas.mathsoft.com/math50">1.5</real>
            </endValue>
          </xyDomain>
        </yAxis>
      </axes>
    </xyPlot>
  </plot>
</region>

Although the first two principal components (PC) represent 99% of the variance, it is the third PC that is key to grouping the data by dosage. Principal component analysis compresses the data to the most dominant factors, but not the most relevant factors.