DESCRIPTION OF PCA METHOD

Historic:

In 1901, Karl Pearson described for the first time the PCA (for Principal Component Analysis) model with a geometric point of view. PCA has been seen as a mean to find a least square estimator of multi-dimensional gaussian statistical model by Harold Hotelling in 1933. But it has been commonly used since the beginning of the grow of computer tools (~1950).

Aim:
When you want to observe a large data set (n objects) x (p variables), You can represent your n objects in a p dimensional space. The idea of PCA is to find a f-dimensional subspaces (f small) which represent well the main variations of n objects cloud. For this , You look for subspaces which maximize the variation of the cloud. More, You choose a basis which corresponds to the singular decomposition value of X, and so PCA is nested.

Criterion:
with O the set of orthonormal matrices, and D the set of diagonal square matrices.

 

Algorithm:
Many algorithms can be used, mainly:

Code:

Applications:

There are many applications of PCA since it is the oldest and main data analysis method, easy to use and to interpret. For example, process control (spectral analysis, ...), image analysis, etc...

Dataset reference:

References:

Pearson K., On lines and planes of closest fit to systems of points in space, Phil. Mag.,2, 11, 1901, 559-572

Hotelling H., Analysis of a complex statistical variable into principal
component
, J. Edu. Psy. , Vol 24, 1933, 417-441 & 498-520

Rao C.R., The use and interpretation of principal components analysis in applied research, Sankhya, serie A, Vol 26, 1964, 329-357

Joliffe I. T., Principal Component Analysis, 1986, 2nd edition, New York Springer.

Go to PCA window description