Principal Components Analysis: Difference between revisions

From GRASS-Wiki
Jump to navigation Jump to search
(update for G7 compatibility)
 
(129 intermediate revisions by 6 users not shown)
Line 1: Line 1:
'''Under Heavy Construction'''
A practical introduction in Principal Components Analysis (or Transformation) that aims to:
* highlight the importance of the values returned by PCA
* address numerical accuracy issues with respect to the default implementation of PCA in GRASS through the ''i.pca'' module.


This page is a short and practical indtroduction in Principal Component Analysis. It aims to highlight the importance of the values returned by PCA. In addition, it addresses numerical accuracy issues with respect to the default implementation of PCA in GRASS through the i.pca module.
''This page is still not complete''


== Principal Component Analysis ==
= Principal Components Analysis =


Principal Component Analysis (PCA) is a dimensionality reduction technique used extensively in Remote Sensing studies (e.g. in change detection studies, image enhancement tasks and more). PCA is in fact a linear transformation applied on (usually) highly correlated multidimensional (e.g. multispectral) data. The input dimensions are transformed in a new coordinate system in which the produced dimensions (called principal components) contain, in decreasing order, the greatest variance related with unchanged landscape features.
Principal Components Analysis (PCA) is a dimensionality reduction technique used extensively in Remote Sensing studies (e.g. in change detection studies, image enhancement tasks and more). PCA is in fact a linear transformation applied on (usually) highly correlated multidimensional (e.g. multispectral) data. The input dimensions are transformed in a new coordinate system in which the produced dimensions (called principal components) contain, in decreasing order, the greatest variance related with unchanged landscape features.




PCA has two algebraic solutions:
PCA has two algebraic solutions:


# Eigenvectors of Covariance
* '''Eigenvectors of Covariance''' (or Correlation) of a given data matrix
# Singular Value Decomposition (The SVD method is used for numerical accuracy [R Documentation])
* '''Singular Value Decomposition''' of a given data matrix




The '''SVD''' method is used for numerical accuracy [R Documentation]


== Background ==
== Background ==
Line 19: Line 22:
The basic steps of the transformation are:
The basic steps of the transformation are:


# organizing a dataset in a matrix
# '''organizing a dataset in a matrix'''
# data centering (=subtracting the dimensions means from tehmself so the dataset has mean=0)
# '''data centering''' (that is: subtracting the dimensions means from themself so each of the dimensions in the dataset has zero mean)
# calculate
# calculate
## the covariance matrix (non-standartised PCA)or
#* the '''covariance matrix''' ('''non-standardised PCA''') or
## the correlation matrix (standartised PCA, also known as scaling)
#* the '''correlation matrix''' ('''standartised PCA''', also known as scaling)
# calculate
# calculate either
## the eigenvectors and eigenvalues of the covariance (or correlation) matrix
#* the '''eigenvectors''' and '''eigenvalues''' of the covariance (or the correlation) matrix or
or
#* the '''SVD''' of the data matrix
## the SVD(=singular value decomposition)
# '''sort variances in decreasing order''' (decreasing eigenvalues; this is default in eigenvalue analysis)
# sort variances in decreasing order (decreasing eigenvalue)
# '''project original dataset ''signals''''' (PC's or PC scores: eigenvector * input-data) to get
# project original dataset *signals* (PC's = eigenvector * input-data)
 
 
'''Why is data centering performed?'''
 
Data centering reduces the square mean error of approximating the input data [''A.A. Miranda, Y.-A. Le Borgne, and G. Bontempi. New Routes from Minimal Approximation Error to Principal Components, Volume 27, Number 3 / June, 2008, Neural Processing Letters, Springer''].
 
 
'''Why is data scaling performed?'''
 
Scaling normalises(=standartises) the variables to have unit variance before the analysis takes place. This normalisation prevents certain features to dominate the analysis because of their large numerical values [''Duda and Hart (1973), Eklundh and Singh (1993)'']. However "if the variables are measured on comparable scales, unstandartised data may be appropriate". [''Maindonald, John; Braun, John: Data Analysis and Graphics Using R. 2. Aufl. 2007, Cambridge University Press'']


== Solutions to PCA ==
== Solutions to PCA ==


The "Eigenvector" solution to PCA involves
The '''Eigenvector''' solution to PCA involves:
 
# calculation of
#* the covariance matrix of the given multidimensional dataset (non-standardised PCA) '''''or'''''
#* the correlation matrix of the given multidimensional dataset (standardised PCA)
# calculation of the eigenvalues and eigenvectors of the covariance (or correlation) matrix
# transformation of the input dataset using the eigenvalues as weighting coefficients
 
== Terminology ==
 
'''eigenvalues''' represent either the variance of the original data contained in each principal component (in case they were computed from the covariance matrix), or the amount of correlation captured by the respective principle components (in case they were computed from the correlation matrix).
 
'''eigenvectors'''
 
* act as weighting coefficients
* represent the contribution of each original dimension the principal components
* tell how the principle components relate to the the data variables (dimensions).
 
Suppose an eigenvector is (0, 0, 1, 0), this indicates that the corresponding principle component is equal to dimension 3 and perpendicular to dimensions 1, 2 and 4. If it is (0.7, 0.7, 0, 0) the PC is equally determined by dimension 1 and 2 (it averages them), and not determined by dimensions 3 and 4. If it is (0.7, 0, -0.7, 0) it is the difference between dimension 1 and 3. Eigenvectors are normalized, meaning that the sum of its squared elements equals 1.
 
'''loadings''' The individual numbers in an eigenvector are called loadings.
 
'''scores''' When the original data are projected onto the eigenvectors, the resulting new variables are called the principle components; the new data values are called PC scores.
 
== Performing PCA with GRASS ==
 
* '''''m.eigensystem'''''
 
The ''m.eigensystem'' module implements the eigenvector solution to PCA. The respective function in R is ''princomp()''. A comparison of their results confirms their almost identical performance. Specifically,
 
# the standard deviations (sdev) reported by ''princomp()'' are (almost) identical with the variances (eigenvalues) reported by ''m.eigensystem''.
 
# ''princomp()'' scales (also referred as normalization) the eigenvectors and so does ''m.eigensystem''. The scaled(=normalised) eigenvectors produced by m.eigensystem are marked with the capital letter N.
 
 
* '''''i.pca'''''
 
The ''i.pca'' module performs PCA based on the SVD solution with data centering but without scaling. A comparison of the results derived by ''i.pca'' and R's ''prcomp()'' function confirms this. Specifically, ''i.pca'' yields the same eigenvectors as R's ''prcomp()'' function does with the following options:
 
prcomp(x, center=TRUE, scale=FALSE)
 
where x is a numeric or complex matrix (or data frame) which provides the data for the principal components analysis (R Documentation).
 
== Useful details ==
 
'''''princomp()'''''
 
If one computes principle components with R, the loadings are printed by default such that loadings close to 0 (between -.1 and .1, this can be controlled) are left out. This can be overriden (see the help page of function loadings). The reason for this is that for large loading tables, the real information is in loadings not close to 0; reading large loading tables is much easier when only important loadings are printed.
 
= Examples using SPOT imagery =
 
Get [http://grass.osgeo.org/sampledata/imagery60_grassdata.tar.gz SPOT images from the Spearfish dataset]
 
 
== Eigenvectors solution ==
 
=== Based on the covariance matrix ===
 
==== Using '''''m.eigensystem''''' ====
 
Note: download m.eigensystem from [[GRASS_AddOns#m.eigensystem GRASS-Addons]] or simply with
 
g.extention m.eigensystem
 
 
Command in one line
 
(echo 3; r.covar spot.ms.1,spot.ms.2,spot.ms.3 | tail -n 3) | m.eigensystem
 
 
Output
r.covar: complete ...
  100%
<span style="color:#009000">E      1159.7452017844          .0000000000    88.38</span>
V          .6910021591          .0000000000
V          .7205280412          .0000000000
V          .4805108400          .0000000000
<span style="color:#900000">N          '''.6236808478'''          .0000000000
N          '''.6503301526'''          .0000000000
N          '''.4336967751'''          .0000000000</span>
<span style="color:#000090">W        21.2394712045          .0000000000
W        22.1470141296          .0000000000
W        14.7695575384          .0000000000</span>
<span style="color:#009000">E        5.9705414972          .0000000000      .45</span>
V          .7119385973          .0000000000
V        -.6358200627          .0000000000
V        -.0703936743          .0000000000
<span style="color:#900000">N          .7438340890          .0000000000
N        -.6643053754          .0000000000
N        -.0735473745          .0000000000</span>
<span style="color:#000090">W        1.8175356507          .0000000000
W        -1.6232096923          .0000000000
W        -.1797107407          .0000000000</span>
<span style="color:#009000">E      146.5031967184          .0000000000    11.16</span>
V          .2265837636          .0000000000
V          .3474697082          .0000000000
V        -.8468727535          .0000000000
<span style="color:#900000">N          .2402770238          .0000000000
N          .3684685345          .0000000000
N        -.8980522763          .0000000000</span>
<span style="color:#000090">W        2.9082771721          .0000000000
W        4.4598880523          .0000000000
W      -10.8698904856          .0000000000</span>
 
''Note that the output is not sorted by relative importance.''
 
The <span style="color:#900000">'''red-bold'''</span> numbers are the normalised eigen vectors. Compare the above results with R's ''princomp()'' function below.
 
==== Using R's '''''princomp()''''' function ====
 
Launch R from within GRASS
R
 
 
Load spgrass6() and projection information in R
library(spgrass6)
G <- gmeta6()
 
 
Read spot.ms bands
spot.ms <- readRAST6(c('spot.ms.1', 'spot.ms.2', 'spot.ms.3'))
 
 
Work around NA's
spot.ms.nas <- which(is.na(spot.ms@data$spot.ms.1) & is.na(spot.ms@data$spot.ms.2) & is.na(spot.ms@data$spot.ms.3))
spot.ms.values <- which(!is.na(spot.ms@data$spot.ms.1) & !is.na(spot.ms@data$spot.ms.2) & !is.na(spot.ms@data$spot.ms.3))
spot.ms.nonas <- spot.ms.values@data[spot.ms.values, ]
 
A better option is to use R's ''complete.cases()'' function. See ''?complete.cases'' within R.
 
Command
 
princomp(spot.ms.nonas)
 
 
Another possible way is to use the function na.omit() and outputs are exactly the same
 
princomp(na.omit(spot.ms))
 
 
Output
 
Call:
princomp(x = modis)
Standard deviations:
  Comp.1  Comp.2  Comp.3
34.055018 12.103846  2.443468
3  variables and  1231860 observations.
 
 
Get loadings
 
princomp(spot.ms.nonas)$loadings
Loadings:
    Comp.1 Comp.2 Comp.3
spot.ms.1 <span style="color:#900000">'''-0.624'''</span>  0.240  0.744
spot.ms.2 <span style="color:#900000">'''-0.650'''</span> -0.368 -0.664
spot.ms.3 <span style="color:#900000">'''-0.434'''</span>  0.898
                Comp.1 Comp.2 Comp.3
SS loadings    1.000  1.000  1.000
Proportion Var  0.333  0.333  0.333
Cumulative Var  0.333  0.667  1.000
 
The <span style="color:#900000">'''red-bold'''</span> eigen vectors of the first principal component, ease of the comparison with the results derived from GRASS' ''m.eigensystem'' module above.
 
Note the missing eigen value in row 3, column 3. It is omitted on purpose. You will appreciate how useful this is when you compute 20 PC's from 120 dimensions. It will be printed when you explicitly ask for showing loadings between -0.1 and 0.1, by
 
print(princomp(spot.ms.nonas)$loadings, cutoff=0)
 
=== Based on the correlation matrix ===
 
==== Using '''''m.eigensystem'''''====
 
 
Command in one line
 
(echo 3; r.covar -r spot.ms.1,spot.ms.2,spot.ms.3 | tail -n 3) | m.eigensystem
 
 
Output
 
r.covar: complete ...
  100%
<span style="color:#009000">E        2.5897901435          .0000000000    86.33</span>
V        -.7080795162          .0000000000
V        -.6979341819          .0000000000
V        -.6128387525          .0000000000
<span style="color:#900000">N        -.6062688915          .0000000000
N        -.5975822957          .0000000000
N        -.5247222419          .0000000000</span>
<span style="color:#000090">W        -.9756579133          .0000000000
W        -.9616787268          .0000000000
W        -.8444263177          .0000000000</span>
<span style="color:#009000">E          .0123265666          .0000000000      .41</span>
V        -.6690685456          .0000000000
V          .6302711261          .0000000000
V          .0552608155          .0000000000
<span style="color:#900000">N        -.7265842171          .0000000000
N          .6844516242          .0000000000
N          .0600112449          .0000000000</span>
<span style="color:#000090">W        -.0806690651          .0000000000
W          .0759912909          .0000000000
W          .0066627528          .0000000000</span>
<span style="color:#009000">E          .3978832898          .0000000000    13.26</span>
V          .3005969725          .0000000000
V          .3883277727          .0000000000
V        -.7895613377          .0000000000
<span style="color:#900000">N          '''.3232853332'''          .0000000000
N          '''.4176378502'''          .0000000000
N        '''-.8491555920'''          .0000000000</span>
<span style="color:#000090">W          .2039218921          .0000000000
W          .2634375639          .0000000000
W        -.5356302845          .0000000000</span>
 
''Note that the output is not sorted by relative importance. In this example the second principal component (accounts for 13.26% of the original variance) can be created by using numbers (i.e. the W lines) from the third group of eigen vectors. To compare with princomp()'s results look at column '''Comp.2''' below''.
 
==== Using R's '''''princomp()''''' function ====
 
 
Perform PCA
 
princomp(spot.ms.nonas, cor=TRUE)
 
 
Output
 
Call:
princomp(x = spot.ms.nonas, cor = TRUE)
Standard deviations:
    Comp.1    Comp.2    Comp.3
1.6092826 0.6307795 0.1110256
3  variables and  1231860 observations.
 
 
Get loadings
 
princomp(spot.ms.nonas, cor=TRUE)$loadings
 
 
Output
 
Loadings:
          Comp.1 Comp.2 Comp.3
spot.ms.1 -0.606 <span style="color:#900000">'''-0.323'''</span>  0.727
spot.ms.2 -0.598 <span style="color:#900000">'''-0.418'''</span> -0.684
spot.ms.3 -0.525  <span style="color:#900000">'''0.849'''</span>     
                Comp.1 Comp.2 Comp.3
SS loadings    1.000  1.000  1.000
Proportion Var  0.333  0.333  0.333
Cumulative Var  0.333  0.667  1.000
 
== SVD solution ==
 
==== Using ''i.pca'' ====
 
Command
<source lang="bash">
i.pca input=spot.ms.1,spot.ms.2,spot.ms.3 out=pca.spot.ms rescale=0,0
</source>
 
Output
 
  Eigen values, (vectors), and [percent importance]:
  PC1  '''1170.12''' (<span style="color:#900090">-0.6251,-0.6536,-0.4268</span>)[88.07%]
  PC2    152.49 ( 0.2328, 0.3658,-0.9011)[11.48%]
  PC3      6.01 ( 0.7450,-0.6626,-0.0765) [0.45%]
 
==== Using R's ''prcomp()'' function ====
 
The following example replicates ''i.pca'''s solution using the same data '''with data centering and without scaling''' (options ''center=TRUE'' and ''scale=FALSE''). Note that these settings are the defaults.
 
 
Command
 
prcomp(spot.ms.nonas)
 
Output
 
Standard deviations:
[1] 34.055032 12.103851  2.443469
Rotation:
                  PC1        PC2        PC3
spot.ms.1 <span style="color:#900090">-0.6236808</span>  0.2402770 -0.74383409
spot.ms.2 <span style="color:#900090">-0.6503302</span>  0.3684685  0.66430538
spot.ms.3 <span style="color:#900090">-0.4336968</span> -0.8980523  0.07354738
 


1. calculation of:
[[Comments]]


* the covariance matrix of the given multidimensional dataset (non-standardised PCA)
In this example ''i.pca'''s performance seems to be identical to R's ''prcomp()'' function which means that data centering is applied prior to the actual PCA.
* the correlation matrix of the given multidimensional dataset (standardised PCA)




2. calculation of the eigenvalues and eigenvectors of the covariance (or correlation) matrix
The three SPOT bands used in this example have the following ranges:


3. transformation of the input dataset using the eigenvalues as weighting coefficients
*spot.ms.1


min=24
max=254


== ??? ==
*spot.ms.2


min=14
max=254
*spot.ms.3


== Terminology ==
min=12
max=254


'''eigenvalues''' represent the variance of the original data contained in each principal component.


Also know as: '''+++'''
Nevertheless, as shown in the examples using MODIS surface reflectance products (read below) in which the range of the bands varies significantly, ''i.pca'''s results do not match the results of R's ''prcomp()'' function with the parameter ''center'' set to ''TRUE''. Instead, the results derived from ''i.pca'' are almost identical when the parameter ''center'' is set to ''FALSE''.


'''eigenvectors''' act as weighting coefficients / represent the contribution of each original dimension the principal components.


Also known as: loadings, '''+++'''
The following example performs PCA '''without data centering and scaling''' (options ''center=FALSE'' and ''scale=FALSE'').


Command


== Implementation of PCA with GRASS ==
prcomp(spot.ms.nonas, center=FALSE, scale=FALSE)


1. Manually by using the m.eigensystem module


The ''m.eigensystem'' module implements the eigenvector solution to PCA. The respective function in R is ''princomp()''. A comparison of their results confirms their almost identical performance.
Output


-Eigenvalues: The standard deviations (sdev) reported by ''princomp()'' are (almost) identical with the variances (= sdev^2) reported by ''m.eigensystem''.
Standard deviations:
[1] '''101.00927''' 15.79861  2.83775
Rotation:
                  PC1        PC2        PC3
spot.ms.1 <span style="color:#900090">-0.5605487</span> -0.3652694  0.7432116
spot.ms.2 <span style="color:#900090">-0.4726613</span> -0.5958032 -0.6493149
spot.ms.3 <span style="color:#900090">-0.6799827</span>  0.7152600 -0.1613280


-Eigenvectors: ''princomp()'' scales the eigenvectors and so does ''m.eigensystem''. The scaled(=normalised) eigenvectors produced by m.eigensystem are marked with the capital letter N.


[[Comments]]


+++


2. Using the ''i.pca'' module


The ''i.pca'' module performs PCA based on the SVD solution without data centering and/or scaling. A comparison of the results derived by ''i.pca'' and R's ''prcomp()'' function confirms this. Specifically, ''i.pca'' yields the same eigenvectors as R's ''prcomp()'' function does with the following options:
The following example performs PCA '''with data centering and scaling''' (options ''center=TRUE'' and ''scale=TRUE'')


prcomp( ,center=FALSE,scale=FALSE)


Command


prcomp(spot.ms.nonas, center=TRUE, scale=TRUE)


== Open issues with ''i.pca'' ==
Output


Currently the eigenvalues reported by ''i.pca'' do '''not''' agree with the respective variance reported by ''m.eigensystem''. In addition a cross-comparison of the results derived by ''m.eigensystem'' and R functions that perform PCA yield almost identical results.
Standard deviations:
[1] 1.6092826 0.6307795 0.1110256
Rotation:
                  PC1        PC2        PC3
spot.ms.1 <span style="color:#900090">-0.6062688</span>  0.3232856 -0.72658417
spot.ms.2 <span style="color:#900090">-0.5975823</span>  0.4176378  0.68445170
spot.ms.3 <span style="color:#900090">-0.5247224</span> -0.8491555  0.06001103


= Examples using MODIS surface reflectance products =


== Eigenvectors solution ==


== Examples ==
=== Based on the covariance matrix ===


All examples presented below were performed using some MODIS surface reflectance products
==== Using '''''m.eigensystem''''' ====


== Perform PCA using ''m.eigensystem'' based on the covariance matrix ==


Command in one line
Command in one line
Line 96: Line 424:
  (echo 3; r.covar b02,b06,b07) | m.eigensystem
  (echo 3; r.covar b02,b06,b07) | m.eigensystem


Output ('''note''': with # are marked the scaled(=normalised) eigenvectors that correspond to princomp()'s results presented after this output)
 
Output
 
* The <span style="color:#009000">E</span> line is the eigen value. (Real part, imaginary part, percent importance)
* The <tt>V</tt> lines are the eigen vectors associated with <span style="color:#009000">E</span>.
* The <span style="color:#900000">N</span> lines are the <tt>V</tt> vectors normalized to have a magnitude of 1. '''These are the scaled eigen vectors that correspond to princomp()'s results presented in the following section.'''
* The <span style="color:#000090">W</span> lines are the <span style="color:#900000">N</span> vector multiplied by the square root of the magnitude of the eigen value(<span style="color:#009000">E</span>). '''Generally referred to as "the factor loadings", also called "component loadings".'''


  r.covar: complete ...
  r.covar: complete ...
  100%
  100%
  E    778244.0258462029          .0000000000    79.20
  <span style="color:#009000">E    778244.0258462029          .0000000000    79.20</span>
  V          .5006581842          .0000000000
  V          .5006581842          .0000000000
  V          .8256483300          .0000000000
  V          .8256483300          .0000000000
  V          .6155834548          .0000000000
  V          .6155834548          .0000000000
  N     #  .4372107421   #      .0000000000
  <span style="color:#900000">N         '''.4372107421'''          .0000000000
  N     #  .7210155161   #      .0000000000
  N         '''.7210155161'''          .0000000000
  N     #  .5375717557   #      .0000000000
  N         '''.5375717557'''          .0000000000</span>
  W      385.6991853500          .0000000000
  <span style="color:#000090">W      385.6991853500          .0000000000
  W      636.0664787886          .0000000000
  W      636.0664787886          .0000000000
  W      474.2358050886          .0000000000
  W      474.2358050886          .0000000000</span>
  E    192494.5769628266          .0000000000    19.59
   
<span style="color:#009000">E    192494.5769628266          .0000000000    19.59</span>
  V        -.8689798010          .0000000000
  V        -.8689798010          .0000000000
  V          .0996340298          .0000000000
  V          .0996340298          .0000000000
  V          .5731134848          .0000000000
  V          .5731134848          .0000000000
  N     #  -.8309940700   #      .0000000000
  <span style="color:#900000">N         -.8309940700         .0000000000
  N     #  .0952787255   #      .0000000000
  N         .0952787255         .0000000000
  N     #  .5480609638   #      .0000000000
  N         .5480609638         .0000000000</span>
  W      -364.5920328433          .0000000000
  <span style="color:#000090">W      -364.5920328433          .0000000000
  W        41.8027823088          .0000000000
  W        41.8027823088          .0000000000
  W      240.4573848757          .0000000000
  W      240.4573848757          .0000000000</span>
  E    11876.4548199713          .0000000000    1.21
   
<span style="color:#009000">E    11876.4548199713          .0000000000    1.21</span>
  V          .2872248982          .0000000000
  V          .2872248982          .0000000000
  V        -.5731591248          .0000000000
  V        -.5731591248          .0000000000
  V          .5351449518          .0000000000
  V          .5351449518          .0000000000
  N     #  .3439413070   #      .0000000000
  <span style="color:#900000">N         .3439413070         .0000000000
  N     #  -.6863370819   #      .0000000000
  N         -.6863370819         .0000000000
  N     #  .6408165005   #      .0000000000
  N         .6408165005         .0000000000</span>
  W        37.4824307850          .0000000000
  <span style="color:#000090">W        37.4824307850          .0000000000
  W      -74.7964308085          .0000000000
  W      -74.7964308085          .0000000000
  W        69.8356366100          .0000000000
  W        69.8356366100          .0000000000</span>
 
 
In general, the solution to the eigen system results in complex
numbers (with both real and imaginary parts).  However, in the example
above, since the input matrix is symmetric (i.e., inverting the rows and columns
gives the same matrix) the eigen system has only real values (i.e., the
imaginary part is zero).
This fact makes it possible to use eigen vectors to perform principle component
transformation of data sets.  The covariance or correlation
matrix of any data set is symmetric
and thus has only real eigen values and vectors.
 
The <span style="color:#900000">'''red-bold'''</span> eigen vectors of the first principal component, ease of the comparison with the results derived from R's ''princomp()'' function that follows.
 
 
<strike>Using the <span style="color:#000090">W</span> vector, new maps can be created:</strike>. The new maps (coordinate system) system is formed by the normalized eigenvectors  <span style="color:#900000">(N)</span> of the variance–covariance (or correlation) matrix  [''Tso, B. & P.M. Mather. Classification methods for remotely sensed data. 2001. Taylor & Francis, London ; New York''].


Repeat the same solution by using R's ''princomp()'' function
<B>r.mapcalc</B> 'pc.1 =  <span style="color:#900000">.4372107421</span>*map.1 <span style="color:#900000">+.7210155161 </span>*map.2 <span style="color:#900000">+ .5375717557</span>*map.3'
<B>r.mapcalc</B> 'pc.2 = <span style="color:#900000">-.8309940700 </span>*map.1 <span style="color:#900000">+ .0952787255</span>*map.2 <span style="color:#900000">+ .5480609638</span>*map.3'
<B>r.mapcalc</B> 'pc.3 =  <span style="color:#900000">.3439413070</span>*map.1 <span style="color:#900000">- .6863370819</span>*map.2 <span style="color:#900000">+  .6408165005</span>*map.3'
 
Visualize results:
d.mon x0
d.rast pc.1
d.rast pc.2
d.rast pc.3
 
==== Using R's '''''princomp()''''' function ====


Command
Command


  princomp(modis)
  princomp(modis)


Output
Output
Line 146: Line 509:
  3  variables and  350596 observations.
  3  variables and  350596 observations.


Get loadings (note: for some reason R refused to yield the eigenvector in row 2, column 2!)
 
Get loadings


  (princomp(modis))$loadings
  (princomp(modis))$loadings
  Loadings:
  Loadings:
     Comp.1 Comp.2 Comp.3
     Comp.1 Comp.2 Comp.3
  b02 -0.418  0.839  0.348
  b02 <span style="color:#900000">'''-0.418'''</span> 0.839  0.348
  b06 -0.725        -0.684
  b06 <span style="color:#900000">'''-0.725'''</span>       -0.684
  b07 -0.547 -0.539  0.641
  b07 <span style="color:#900000">'''-0.547'''</span> -0.539  0.641
 
                Comp.1 Comp.2 Comp.3
SS loadings    1.000  1.000  1.000
Proportion Var  0.333  0.333  0.333
Cumulative Var  0.333  0.667  1.000
 
The <span style="color:#900000">'''red-bold'''</span> eigen vectors of the first principal component, ease of the comparison with the results derived from GRASS' ''m.eigensystem'' module above.
 
=== Based on the correlation matrix ===
 
==== Using '''''m.eigensystem'''''====
 
 
Command in one line
 
(echo 3; r.covar -r MOD07_b02,MOD07_b06,MOD07_b07)|m.eigensystem
 
 
Output
 
r.covar: complete ...
  100%
<span style="color:#009000">E        2.2915877718          .0000000000    76.39</span>
V        -.5755655569          .0000000000
V        -.7660355041          .0000000000
V        -.6809380186          .0000000000
<span style="color:#900000">N        -.4896413269          .0000000000
N        -.6516766616          .0000000000
N        -.5792830912          .0000000000</span>
<span style="color:#000090">W        -.7412186091          .0000000000
W        -.9865075560          .0000000000
W        -.8769182329          .0000000000</span>
<span style="color:#009000">E          .6740687010          .0000000000    22.47</span>
V          .8667178982          .0000000000
V        -.1116525720          .0000000000
V        -.6069908335          .0000000000
<span style="color:#900000">N          '''.8145815825'''          .0000000000
N        '''-.1049362531'''          .0000000000
N        '''-.5704780699'''          .0000000000</span>
<span style="color:#000090">W          .6687852213          .0000000000
W        -.0861544341          .0000000000
W        -.4683721194          .0000000000</span>
<span style="color:#009000">E          .0343435272          .0000000000    1.14</span>
V          .2486404469          .0000000000
V        -.6006166822          .0000000000
V          .4655120098          .0000000000
<span style="color:#900000">N          .3109794470          .0000000000
N        -.7512029762          .0000000000
N          .5822249325          .0000000000</span>
<span style="color:#000090">W          .0576307320          .0000000000
W        -.1392129859          .0000000000
W          .1078979635          .0000000000</span>
 
The <span style="color:#900000">'''red-bold'''</span> eigen vectors of the second principal component, ease of the comparison with the results derived from R's ''princomp()'' function that follows.
 
==== Using R's '''''princomp()''''' function ====
 
 
Command
 
princomp(mod07, cor=TRUE)
 
 
Output
 
Call:
princomp(x = mod07, cor = TRUE)
 
Standard deviations:
    Comp.1    Comp.2    Comp.3
1.5030740 0.8397807 0.1885121
  3  variables and  350596 observations.
 
 
Get loadings
 
(princomp(mod07, cor=TRUE))$loadings


Output
Loadings:
                              Comp.1 Comp.2 Comp.3
MOD2007_242_500_sur_refl_b02 -0.481  <span style="color:#900000">'''0.820'''</span>  0.310
MOD2007_242_500_sur_refl_b06 -0.656 <span style="color:#900000">'''-0.102'''</span> -0.748
MOD2007_242_500_sur_refl_b07 -0.582 <span style="color:#900000">'''-0.563'''</span>  0.587
                 Comp.1 Comp.2 Comp.3
                 Comp.1 Comp.2 Comp.3
  SS loadings    1.000  1.000  1.000
  SS loadings    1.000  1.000  1.000
  Proportion Var  0.333  0.333  0.333
  Proportion Var  0.333  0.333  0.333
  Cumulative Var  0.333  0.667  1.000
  Cumulative Var  0.333  0.667  1.000
The <span style="color:#900000">'''red-bold'''</span> eigen vectors of the second component, ease of the comparison with the results derived from GRASS' ''m.eigensystem'' module above.


[[Comments]]
[[Comments]]
'''+++'''
'''Add comments here...'''


== SVD ==
==== Using ''i.pca'' ====


== Perform PCA using ''i.pca'' ==


Command
Command


  i.pca input=b2,b6,b7 output=pca.b267
  i.pca input=b2,b6,b7 output=pca.b267


Output
Output


  Eigen values, (vectors), and [percent importance]:
  Eigen values, (vectors), and [percent importance]:
  PC1 6307563.04 ( -0.64 -0.65 -0.42 ) [ 98.71% ]
  PC1 '''6307563.04''' ( <span style="color:#900090">-0.64, -0.65, -0.42</span> ) [ 98.71% ]
  PC2 78023.63 ( -0.71 0.28 0.64 ) [ 1.22% ]
  PC2   78023.63 ( -0.710.280.64 ) [ 1.22% ]
  PC3 4504.60 ( -0.30 0.71 -0.64 ) [ 0.07% ]  
  PC3     4504.60 ( -0.300.71, -0.64 ) [ 0.07% ]
 
 
==== Using R's ''prcomp()'' function ====
 
The following example shows that ''i.pca''' does not perform data centering in this specific case:


Repeat PCA with R's ''prcomp()'' using the same data with options set to center=FALSE, scale=FALSE


Command
Command


  prcomp(mod07, center=FALSE, scale=FALSE) <<== this corresponds to i.pca
  prcomp(mod07, center=FALSE, scale=FALSE)
 


Output
Output


  Standard deviations:
  Standard deviations:
  [1] 4288.3788  476.8904  114.3971
  [1] '''4288.3788''' 476.8904  114.3971
  Rotation:
  Rotation:
                                     PC1        PC2        PC3
                                     PC1        PC2        PC3
  MOD2007_242_500_sur_refl_b02 -0.6353238  0.7124070 -0.2980602
  MOD2007_242_500_sur_refl_b02 <span style="color:#900090">-0.6353238</span> 0.7124070 -0.2980602
  MOD2007_242_500_sur_refl_b06 -0.6485551 -0.2826985  0.7067234
  MOD2007_242_500_sur_refl_b06 <span style="color:#900090">-0.6485551</span> -0.2826985  0.7067234
  MOD2007_242_500_sur_refl_b07 -0.4192135 -0.6423066 -0.6416403
  MOD2007_242_500_sur_refl_b07 <span style="color:#900090">-0.4192135</span> -0.6423066 -0.6416403




[[Comments]]
[[Comments]]


* The eigenvector matrices match although prcomp() report's the loadings (=eigenvectors) column-wise and i.pca row-wise.
* Performing data centering manually in grass (using ''r.mapcalc'') and repeating the i.pca gives the same results as prcomp(x, center=TRUE, scale=FALSE). '''Example to be added.'''
* The eigenvalues do not match. To exemplify, the standart deviation for PC1 reported (above) by ''prcomp()'' is 4288.3788 and the variance reported by ''i.pca'' is 6307563.04 (or standard deviation reported by ''i.pca'' = sqrt(6307563.04) = 2511.486)
* The eigenvector matrices match although ''prcomp()'' reports loadings (=eigenvectors) column-wise and ''i.pca'' row-wise.
* The eigenvalues do '''not''' match. To exemplify, the standard deviation for PC1 reported by ''prcomp()'' is '''4288.3788''' and the variance reported by ''i.pca'' is 6307563.04 [ sqrt(6307563.04) = '''2511.486''' ]
 


'''More examples to be added'''
'''More examples to be added'''


= References =
== References ==
 
Posts in the GRASS-user mailing list


There are many posts concerning the functionality of ''i.pca''. Most of them are questioning the non-reporting of eigenvalues (an issue recently fixed).
Jon Shlens, "Tutorial on Principal Component Analysis, Dec 2005," [http://www.snl.salk.edu/~shlens/pca.pdf] (accessed on March, 2009).




Old posts
== e-mails in GRASS-user mailing list ==


[http://n2.nabble.com/i.pca-output-td1863271.html#a1863271]
There are many posts concerning the functionality of ''i.pca''. Most of them are questioning the non-reporting of eigenvalues (an issue recently fixed).




Recent posts
'''Posts in grass-user mailing list'''


[http://n2.nabble.com/Testing-i.pca-~-prcomp()%2C-m.eigensystem-~-princomp()-td2413700.html#a2415727]
* [http://osgeo-org.1803224.n2.nabble.com/i.pca-output-td1863271.html#a1863271]
* [http://osgeo-org.1803224.n2.nabble.com/Testing-i.pca-~-prcomp()%2C-m.eigensystem-~-princomp()-td2413700.html#a2415727 Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()]
* [http://osgeo-org.1803224.n2.nabble.com/Calculating-eigen-values-and---variance-explained-after-PCA-analysis-td2383005.html#a2383165]
* [http://osgeo-org.1803224.n2.nabble.com/Re%3A-Calculating-eigen-valuesand-varianceexplainedafter-PCA-analysis-td2413881.html#a2413881]
* [http://osgeo-org.1803224.n2.nabble.com/Re%3A-Calculating-eigen-values-and--varianceexplainedafter-PCA-analysis-td2395558.html#a2409630]
* [http://osgeo-org.1803224.n2.nabble.com/i-pca-fixes-in-trunk-td6962364.html] (2011)


[http://n2.nabble.com/Calculating-eigen-values-and---variance-explained-after-PCA-analysis-td2383005.html#a2383165]
'''More sources to be added'''


[http://n2.nabble.com/Re%3A-Calculating-eigen-valuesand-varianceexplainedafter-PCA-analysis-td2413881.html#a2413881]
[[Category: Documentation]]
 
[[Category: Image processing]]
[http://n2.nabble.com/Re%3A-Calculating-eigen-values-and--varianceexplainedafter-PCA-analysis-td2395558.html#a2409630]
 
'''More sources to be added'''

Latest revision as of 15:00, 28 December 2014

A practical introduction in Principal Components Analysis (or Transformation) that aims to:

  • highlight the importance of the values returned by PCA
  • address numerical accuracy issues with respect to the default implementation of PCA in GRASS through the i.pca module.

This page is still not complete

Principal Components Analysis

Principal Components Analysis (PCA) is a dimensionality reduction technique used extensively in Remote Sensing studies (e.g. in change detection studies, image enhancement tasks and more). PCA is in fact a linear transformation applied on (usually) highly correlated multidimensional (e.g. multispectral) data. The input dimensions are transformed in a new coordinate system in which the produced dimensions (called principal components) contain, in decreasing order, the greatest variance related with unchanged landscape features.


PCA has two algebraic solutions:

  • Eigenvectors of Covariance (or Correlation) of a given data matrix
  • Singular Value Decomposition of a given data matrix


The SVD method is used for numerical accuracy [R Documentation]

Background

The basic steps of the transformation are:

  1. organizing a dataset in a matrix
  2. data centering (that is: subtracting the dimensions means from themself so each of the dimensions in the dataset has zero mean)
  3. calculate
    • the covariance matrix (non-standardised PCA) or
    • the correlation matrix (standartised PCA, also known as scaling)
  4. calculate either
    • the eigenvectors and eigenvalues of the covariance (or the correlation) matrix or
    • the SVD of the data matrix
  5. sort variances in decreasing order (decreasing eigenvalues; this is default in eigenvalue analysis)
  6. project original dataset signals (PC's or PC scores: eigenvector * input-data) to get


Why is data centering performed?

Data centering reduces the square mean error of approximating the input data [A.A. Miranda, Y.-A. Le Borgne, and G. Bontempi. New Routes from Minimal Approximation Error to Principal Components, Volume 27, Number 3 / June, 2008, Neural Processing Letters, Springer].


Why is data scaling performed?

Scaling normalises(=standartises) the variables to have unit variance before the analysis takes place. This normalisation prevents certain features to dominate the analysis because of their large numerical values [Duda and Hart (1973), Eklundh and Singh (1993)]. However "if the variables are measured on comparable scales, unstandartised data may be appropriate". [Maindonald, John; Braun, John: Data Analysis and Graphics Using R. 2. Aufl. 2007, Cambridge University Press]

Solutions to PCA

The Eigenvector solution to PCA involves:

  1. calculation of
    • the covariance matrix of the given multidimensional dataset (non-standardised PCA) or
    • the correlation matrix of the given multidimensional dataset (standardised PCA)
  2. calculation of the eigenvalues and eigenvectors of the covariance (or correlation) matrix
  3. transformation of the input dataset using the eigenvalues as weighting coefficients

Terminology

eigenvalues represent either the variance of the original data contained in each principal component (in case they were computed from the covariance matrix), or the amount of correlation captured by the respective principle components (in case they were computed from the correlation matrix).

eigenvectors

  • act as weighting coefficients
  • represent the contribution of each original dimension the principal components
  • tell how the principle components relate to the the data variables (dimensions).

Suppose an eigenvector is (0, 0, 1, 0), this indicates that the corresponding principle component is equal to dimension 3 and perpendicular to dimensions 1, 2 and 4. If it is (0.7, 0.7, 0, 0) the PC is equally determined by dimension 1 and 2 (it averages them), and not determined by dimensions 3 and 4. If it is (0.7, 0, -0.7, 0) it is the difference between dimension 1 and 3. Eigenvectors are normalized, meaning that the sum of its squared elements equals 1.

loadings The individual numbers in an eigenvector are called loadings.

scores When the original data are projected onto the eigenvectors, the resulting new variables are called the principle components; the new data values are called PC scores.

Performing PCA with GRASS

  • m.eigensystem

The m.eigensystem module implements the eigenvector solution to PCA. The respective function in R is princomp(). A comparison of their results confirms their almost identical performance. Specifically,

  1. the standard deviations (sdev) reported by princomp() are (almost) identical with the variances (eigenvalues) reported by m.eigensystem.
  1. princomp() scales (also referred as normalization) the eigenvectors and so does m.eigensystem. The scaled(=normalised) eigenvectors produced by m.eigensystem are marked with the capital letter N.


  • i.pca

The i.pca module performs PCA based on the SVD solution with data centering but without scaling. A comparison of the results derived by i.pca and R's prcomp() function confirms this. Specifically, i.pca yields the same eigenvectors as R's prcomp() function does with the following options:

prcomp(x, center=TRUE, scale=FALSE)

where x is a numeric or complex matrix (or data frame) which provides the data for the principal components analysis (R Documentation).

Useful details

princomp()

If one computes principle components with R, the loadings are printed by default such that loadings close to 0 (between -.1 and .1, this can be controlled) are left out. This can be overriden (see the help page of function loadings). The reason for this is that for large loading tables, the real information is in loadings not close to 0; reading large loading tables is much easier when only important loadings are printed.

Examples using SPOT imagery

Get SPOT images from the Spearfish dataset


Eigenvectors solution

Based on the covariance matrix

Using m.eigensystem

Note: download m.eigensystem from GRASS_AddOns#m.eigensystem GRASS-Addons or simply with

g.extention m.eigensystem


Command in one line

(echo 3; r.covar spot.ms.1,spot.ms.2,spot.ms.3 | tail -n 3) | m.eigensystem


Output

r.covar: complete ...
 100%
E      1159.7452017844          .0000000000    88.38
V          .6910021591          .0000000000
V          .7205280412          .0000000000
V          .4805108400          .0000000000
N          .6236808478          .0000000000
N          .6503301526          .0000000000
N          .4336967751          .0000000000
W        21.2394712045          .0000000000
W        22.1470141296          .0000000000
W        14.7695575384          .0000000000

E         5.9705414972          .0000000000      .45
V          .7119385973          .0000000000
V         -.6358200627          .0000000000
V         -.0703936743          .0000000000
N          .7438340890          .0000000000
N         -.6643053754          .0000000000
N         -.0735473745          .0000000000
W         1.8175356507          .0000000000
W        -1.6232096923          .0000000000
W         -.1797107407          .0000000000

E       146.5031967184          .0000000000    11.16
V          .2265837636          .0000000000
V          .3474697082          .0000000000
V         -.8468727535          .0000000000
N          .2402770238          .0000000000
N          .3684685345          .0000000000
N         -.8980522763          .0000000000
W         2.9082771721          .0000000000
W         4.4598880523          .0000000000
W       -10.8698904856          .0000000000

Note that the output is not sorted by relative importance.

The red-bold numbers are the normalised eigen vectors. Compare the above results with R's princomp() function below.

Using R's princomp() function

Launch R from within GRASS

R


Load spgrass6() and projection information in R

library(spgrass6)
G <- gmeta6()


Read spot.ms bands

spot.ms <- readRAST6(c('spot.ms.1', 'spot.ms.2', 'spot.ms.3'))


Work around NA's

spot.ms.nas <- which(is.na(spot.ms@data$spot.ms.1) & is.na(spot.ms@data$spot.ms.2) & is.na(spot.ms@data$spot.ms.3))
spot.ms.values <- which(!is.na(spot.ms@data$spot.ms.1) & !is.na(spot.ms@data$spot.ms.2) & !is.na(spot.ms@data$spot.ms.3))
spot.ms.nonas <- spot.ms.values@data[spot.ms.values, ]

A better option is to use R's complete.cases() function. See ?complete.cases within R.

Command

princomp(spot.ms.nonas)


Another possible way is to use the function na.omit() and outputs are exactly the same

princomp(na.omit(spot.ms))


Output

Call:
princomp(x = modis)
Standard deviations:
  Comp.1   Comp.2   Comp.3
34.055018 12.103846  2.443468

3  variables and  1231860 observations.


Get loadings

princomp(spot.ms.nonas)$loadings
Loadings:
    Comp.1 Comp.2 Comp.3
spot.ms.1 -0.624  0.240  0.744
spot.ms.2 -0.650 -0.368 -0.664
spot.ms.3 -0.434  0.898 

               Comp.1 Comp.2 Comp.3
SS loadings     1.000  1.000  1.000
Proportion Var  0.333  0.333  0.333
Cumulative Var  0.333  0.667  1.000

The red-bold eigen vectors of the first principal component, ease of the comparison with the results derived from GRASS' m.eigensystem module above.

Note the missing eigen value in row 3, column 3. It is omitted on purpose. You will appreciate how useful this is when you compute 20 PC's from 120 dimensions. It will be printed when you explicitly ask for showing loadings between -0.1 and 0.1, by

print(princomp(spot.ms.nonas)$loadings, cutoff=0)

Based on the correlation matrix

Using m.eigensystem

Command in one line

(echo 3; r.covar -r spot.ms.1,spot.ms.2,spot.ms.3 | tail -n 3) | m.eigensystem


Output

r.covar: complete ...
 100%
E         2.5897901435          .0000000000    86.33
V         -.7080795162          .0000000000
V         -.6979341819          .0000000000
V         -.6128387525          .0000000000
N         -.6062688915          .0000000000
N         -.5975822957          .0000000000
N         -.5247222419          .0000000000
W         -.9756579133          .0000000000
W         -.9616787268          .0000000000
W         -.8444263177          .0000000000

E          .0123265666          .0000000000      .41
V         -.6690685456          .0000000000
V          .6302711261          .0000000000
V          .0552608155          .0000000000
N         -.7265842171          .0000000000
N          .6844516242          .0000000000
N          .0600112449          .0000000000
W         -.0806690651          .0000000000
W          .0759912909          .0000000000
W          .0066627528          .0000000000

E          .3978832898          .0000000000    13.26
V          .3005969725          .0000000000
V          .3883277727          .0000000000
V         -.7895613377          .0000000000
N          .3232853332          .0000000000
N          .4176378502          .0000000000
N         -.8491555920          .0000000000
W          .2039218921          .0000000000
W          .2634375639          .0000000000
W         -.5356302845          .0000000000

Note that the output is not sorted by relative importance. In this example the second principal component (accounts for 13.26% of the original variance) can be created by using numbers (i.e. the W lines) from the third group of eigen vectors. To compare with princomp()'s results look at column Comp.2 below.

Using R's princomp() function

Perform PCA

princomp(spot.ms.nonas, cor=TRUE)


Output

Call:
princomp(x = spot.ms.nonas, cor = TRUE)

Standard deviations:
   Comp.1    Comp.2    Comp.3 
1.6092826 0.6307795 0.1110256

3  variables and  1231860 observations.


Get loadings

princomp(spot.ms.nonas, cor=TRUE)$loadings


Output

Loadings:
          Comp.1 Comp.2 Comp.3
spot.ms.1 -0.606 -0.323  0.727
spot.ms.2 -0.598 -0.418 -0.684
spot.ms.3 -0.525  0.849       

               Comp.1 Comp.2 Comp.3
SS loadings     1.000  1.000  1.000
Proportion Var  0.333  0.333  0.333
Cumulative Var  0.333  0.667  1.000

SVD solution

Using i.pca

Command

 i.pca input=spot.ms.1,spot.ms.2,spot.ms.3 out=pca.spot.ms rescale=0,0

Output

 Eigen values, (vectors), and [percent importance]:
  PC1   1170.12 (-0.6251,-0.6536,-0.4268)[88.07%]
  PC2    152.49 ( 0.2328, 0.3658,-0.9011)[11.48%]
  PC3      6.01 ( 0.7450,-0.6626,-0.0765) [0.45%]

Using R's prcomp() function

The following example replicates i.pca's solution using the same data with data centering and without scaling (options center=TRUE and scale=FALSE). Note that these settings are the defaults.


Command

prcomp(spot.ms.nonas)

Output

Standard deviations:
[1] 34.055032 12.103851  2.443469

Rotation:
                 PC1        PC2         PC3
spot.ms.1 -0.6236808  0.2402770 -0.74383409
spot.ms.2 -0.6503302  0.3684685  0.66430538
spot.ms.3 -0.4336968 -0.8980523  0.07354738


Comments

In this example i.pca's performance seems to be identical to R's prcomp() function which means that data centering is applied prior to the actual PCA.


The three SPOT bands used in this example have the following ranges:

  • spot.ms.1
min=24
max=254
  • spot.ms.2
min=14
max=254
  • spot.ms.3
min=12
max=254


Nevertheless, as shown in the examples using MODIS surface reflectance products (read below) in which the range of the bands varies significantly, i.pca's results do not match the results of R's prcomp() function with the parameter center set to TRUE. Instead, the results derived from i.pca are almost identical when the parameter center is set to FALSE.


The following example performs PCA without data centering and scaling (options center=FALSE and scale=FALSE).

Command

prcomp(spot.ms.nonas, center=FALSE, scale=FALSE)


Output

Standard deviations:
[1] 101.00927  15.79861   2.83775

Rotation:
                 PC1        PC2        PC3
spot.ms.1 -0.5605487 -0.3652694  0.7432116
spot.ms.2 -0.4726613 -0.5958032 -0.6493149
spot.ms.3 -0.6799827  0.7152600 -0.1613280


Comments

+++


The following example performs PCA with data centering and scaling (options center=TRUE and scale=TRUE)


Command

prcomp(spot.ms.nonas, center=TRUE, scale=TRUE)

Output

Standard deviations:
[1] 1.6092826 0.6307795 0.1110256

Rotation:
                 PC1        PC2         PC3
spot.ms.1 -0.6062688  0.3232856 -0.72658417
spot.ms.2 -0.5975823  0.4176378  0.68445170
spot.ms.3 -0.5247224 -0.8491555  0.06001103

Examples using MODIS surface reflectance products

Eigenvectors solution

Based on the covariance matrix

Using m.eigensystem

Command in one line

(echo 3; r.covar b02,b06,b07) | m.eigensystem


Output

  • The E line is the eigen value. (Real part, imaginary part, percent importance)
  • The V lines are the eigen vectors associated with E.
  • The N lines are the V vectors normalized to have a magnitude of 1. These are the scaled eigen vectors that correspond to princomp()'s results presented in the following section.
  • The W lines are the N vector multiplied by the square root of the magnitude of the eigen value(E). Generally referred to as "the factor loadings", also called "component loadings".
r.covar: complete ...
100%
E    778244.0258462029          .0000000000    79.20
V          .5006581842          .0000000000
V          .8256483300          .0000000000
V          .6155834548          .0000000000
N          .4372107421          .0000000000
N          .7210155161          .0000000000
N          .5375717557          .0000000000
W       385.6991853500          .0000000000
W       636.0664787886          .0000000000
W       474.2358050886          .0000000000

E    192494.5769628266          .0000000000    19.59
V         -.8689798010          .0000000000
V          .0996340298          .0000000000
V          .5731134848          .0000000000
N         -.8309940700          .0000000000
N          .0952787255          .0000000000
N          .5480609638          .0000000000
W      -364.5920328433          .0000000000
W        41.8027823088          .0000000000
W       240.4573848757          .0000000000

E     11876.4548199713          .0000000000     1.21
V          .2872248982          .0000000000
V         -.5731591248          .0000000000
V          .5351449518          .0000000000
N          .3439413070          .0000000000
N         -.6863370819          .0000000000
N          .6408165005          .0000000000
W        37.4824307850          .0000000000
W       -74.7964308085          .0000000000
W        69.8356366100          .0000000000


In general, the solution to the eigen system results in complex numbers (with both real and imaginary parts). However, in the example above, since the input matrix is symmetric (i.e., inverting the rows and columns gives the same matrix) the eigen system has only real values (i.e., the imaginary part is zero). This fact makes it possible to use eigen vectors to perform principle component transformation of data sets. The covariance or correlation matrix of any data set is symmetric and thus has only real eigen values and vectors.

The red-bold eigen vectors of the first principal component, ease of the comparison with the results derived from R's princomp() function that follows.


Using the W vector, new maps can be created:. The new maps (coordinate system) system is formed by the normalized eigenvectors (N) of the variance–covariance (or correlation) matrix [Tso, B. & P.M. Mather. Classification methods for remotely sensed data. 2001. Taylor & Francis, London ; New York].

r.mapcalc 'pc.1 =  .4372107421*map.1 +.7210155161 *map.2 + .5375717557*map.3'
r.mapcalc 'pc.2 = -.8309940700 *map.1 + .0952787255*map.2 + .5480609638*map.3'
r.mapcalc 'pc.3 =   .3439413070*map.1 - .6863370819*map.2 +  .6408165005*map.3'

Visualize results:

d.mon x0
d.rast pc.1
d.rast pc.2
d.rast pc.3

Using R's princomp() function

Command

princomp(modis)


Output

Call:
princomp(x = modis)
Standard deviations:
  Comp.1   Comp.2   Comp.3
857.5737 436.0922 108.5083
3  variables and  350596 observations.


Get loadings

(princomp(modis))$loadings
Loadings:
    Comp.1 Comp.2 Comp.3
b02 -0.418  0.839  0.348
b06 -0.725        -0.684
b07 -0.547 -0.539  0.641
               Comp.1 Comp.2 Comp.3
SS loadings     1.000  1.000  1.000
Proportion Var  0.333  0.333  0.333
Cumulative Var  0.333  0.667  1.000

The red-bold eigen vectors of the first principal component, ease of the comparison with the results derived from GRASS' m.eigensystem module above.

Based on the correlation matrix

Using m.eigensystem

Command in one line

(echo 3; r.covar -r MOD07_b02,MOD07_b06,MOD07_b07)|m.eigensystem


Output

r.covar: complete ...
 100%
E         2.2915877718          .0000000000    76.39
V         -.5755655569          .0000000000
V         -.7660355041          .0000000000
V         -.6809380186          .0000000000
N         -.4896413269          .0000000000
N         -.6516766616          .0000000000
N         -.5792830912          .0000000000
W         -.7412186091          .0000000000
W         -.9865075560          .0000000000
W         -.8769182329          .0000000000

E          .6740687010          .0000000000    22.47
V          .8667178982          .0000000000
V         -.1116525720          .0000000000
V         -.6069908335          .0000000000
N          .8145815825          .0000000000
N         -.1049362531          .0000000000
N         -.5704780699          .0000000000
W          .6687852213          .0000000000
W         -.0861544341          .0000000000
W         -.4683721194          .0000000000

E          .0343435272          .0000000000     1.14
V          .2486404469          .0000000000
V         -.6006166822          .0000000000
V          .4655120098          .0000000000
N          .3109794470          .0000000000
N         -.7512029762          .0000000000
N          .5822249325          .0000000000
W          .0576307320          .0000000000
W         -.1392129859          .0000000000
W          .1078979635          .0000000000

The red-bold eigen vectors of the second principal component, ease of the comparison with the results derived from R's princomp() function that follows.

Using R's princomp() function

Command

princomp(mod07, cor=TRUE)


Output

Call:
princomp(x = mod07, cor = TRUE)
Standard deviations:
   Comp.1    Comp.2    Comp.3
1.5030740 0.8397807 0.1885121

 3  variables and  350596 observations.


Get loadings

(princomp(mod07, cor=TRUE))$loadings


Output

Loadings:
                             Comp.1 Comp.2 Comp.3
MOD2007_242_500_sur_refl_b02 -0.481  0.820  0.310
MOD2007_242_500_sur_refl_b06 -0.656 -0.102 -0.748
MOD2007_242_500_sur_refl_b07 -0.582 -0.563  0.587

               Comp.1 Comp.2 Comp.3
SS loadings     1.000  1.000  1.000
Proportion Var  0.333  0.333  0.333
Cumulative Var  0.333  0.667  1.000

The red-bold eigen vectors of the second component, ease of the comparison with the results derived from GRASS' m.eigensystem module above.


Comments Add comments here...

SVD

Using i.pca

Command

i.pca input=b2,b6,b7 output=pca.b267


Output

Eigen values, (vectors), and [percent importance]:
PC1  6307563.04 ( -0.64, -0.65, -0.42 ) [ 98.71% ]
PC2    78023.63 ( -0.71,  0.28,  0.64 ) [  1.22% ]
PC3     4504.60 ( -0.30,  0.71, -0.64 ) [  0.07% ]


Using R's prcomp() function

The following example shows that i.pca' does not perform data centering in this specific case:


Command

prcomp(mod07, center=FALSE, scale=FALSE)


Output

Standard deviations:
[1] 4288.3788  476.8904  114.3971
Rotation:
                                   PC1        PC2        PC3
MOD2007_242_500_sur_refl_b02 -0.6353238  0.7124070 -0.2980602
MOD2007_242_500_sur_refl_b06 -0.6485551 -0.2826985  0.7067234
MOD2007_242_500_sur_refl_b07 -0.4192135 -0.6423066 -0.6416403


Comments

  • Performing data centering manually in grass (using r.mapcalc) and repeating the i.pca gives the same results as prcomp(x, center=TRUE, scale=FALSE). Example to be added.
  • The eigenvector matrices match although prcomp() reports loadings (=eigenvectors) column-wise and i.pca row-wise.
  • The eigenvalues do not match. To exemplify, the standard deviation for PC1 reported by prcomp() is 4288.3788 and the variance reported by i.pca is 6307563.04 [ sqrt(6307563.04) = 2511.486 ]


More examples to be added

References

Jon Shlens, "Tutorial on Principal Component Analysis, Dec 2005," [1] (accessed on March, 2009).


e-mails in GRASS-user mailing list

There are many posts concerning the functionality of i.pca. Most of them are questioning the non-reporting of eigenvalues (an issue recently fixed).


Posts in grass-user mailing list

More sources to be added