The PARAFAC model is a decomposition method that allows
for both exploratory purposes and curve resolution.
The same model can also been applied to monitoring schemes for batch processes.
Quantitative determination (i.e. regression) is available in theory, but
it has not been implemented as yet.
For more information see literature.
Several plots are of interest depending on the purpose of the analysis.
Some other plots become available only in specific cases:
| NB
Some plots allow for several models to be present;
when this is not the case (like for the Score/Loadings plots), the user is
requested to choose which model to plot: Whenever projections are presents the corresponding results are given precedence with respect to the calibration (which can be displayed, as an option) and the validation. |
![]() |
Loadings and scores can be displayed in up to
tridimensional plots (1D, 2D or 3D)

If the number of components is
insufficient certain plot's dimensions will be inactive (i.e. for a two
dimensional PARAFAC the 3D plots are not available).
After the desired menu is selected, a "PARAFAC Plot Control"
window opens:

The "Axes" frame
It allows to choose which component, if any, is to be plot along which axis:
![]() |
![]() |
NB. Unless the PARAFAC model were calculated with orthogonality constraints, the the axes in the plot are then not orthogonal in reality (read Kiers... for more informations).
The "Validation & prediction" frame
Its content varies depending on the type of validation (if any) or if the model is applied on external data :
Choosing to plot replicates/predictions deactivate the "All" option in the 1D plots.
The "Display options" frame and the preferences menu
The first menu in the frame allows to choose the type of marker:
The second menu specify whether the a line should link the points (Continuous) or not (Discrete)
The submenus in the 'Preferences' menu are activated/deactivated depending on the choices made on the main control window:
|
'Show model' ->
'labels' with replicates displayed as green points ![]() |
'Show model' -> 'no'
with replicates displayed as scalars ![]() |
![]() |
The coloured hulls are relative to the replicates for a 100 replicates bootstrap of a rank 4 PARAFAC model. The black labels represent the final model's scores. |
The other submenus are:
Before the changes become visible it is necessary to press the button 'Plot'.
Explained
Variation, PRESS, RMSE
Explained Variation (expressed as a % of
the total variation in the set), Prediction REsiduals Sum of Squares
and Root Mean Squared Error reflect all the goodness of the fit, in
particular:![]() where the PRESS refers to the calibration or to the validation samples/batches and TSS stands for Total Sum of Squares. The RMSE is linked to the PRESS by the relation: ![]() where n is the number of "non-missing" elements in the array. They give then the same type of information, albeit in different measurement units. To plot the overall explained variation the following menu has to be selected
The other submenus will have the EV% plotted against a specific axis (see figure on the side). Subsets can be chosen, as for the residual sum of squares. |
![]() |
![]() |
When the model is validated
using full cross-validation, the EV can be computed both on the complete model
or on the predictions of the samples left out at each step. The choice is made
through a requester that appears when subsets are
selected in the first mode:

The default is "Calibration".
A similar choice is to be made when displaying the
explained v versus one mode that is not the first when
test set validation is employed. In this case it is asked wheter the residuals
shall refer to the calibration or to the test set.
Again the default is "Calibration".
Residuals
sum of squares (Q-statistics)
This plot involves many
selections, besides the choice of the model's rank:
![]() 1) Select mode to plot against in the 'Results' -> 'Residuals' menu |
2) Choose subsets in the various modes. ![]() ![]() |
When the model is validated using full
cross-validation, the residuals can be computed both on the complete model or
on the predictions of the samples left out at each step. The choice is made
through a requester that appears when subsets are
selected in the first mode:

The default is "Calibration".
A similar choice is to be made when displaying the Q statistic versus one mode
that is not the first when test set validation is employed. In this case it is
asked wheter the residuals shall refer to the calibration or to the test set.
Again the default is "Calibration".
If more than one sample is selected and the desired mode is
not the first, one plot per sample is displayed. It is possible to see the
sample label by clicking on the plot with the left mouse button.
The
confidence limits for the Q-statistic are computed on the basis of Jackson and
Mudholkar and displayed as light blue
solid/dashed lines. The NOC batches/samples
(the calibration set) are used to compute these limits.

Note: if subsets are selected in modes other than the first the non-normalised residuals plot actually displays the contribution plot to the Q-statistic.
Slab-wise congruence
The congruence (i.e. the cosine) between the real data and the model is plotted
versus the scalars in the selected modes.
A value of 1 of the congruence means that the corresponding slab is perfectly
recovered by the model, although, due to the noise, this limit is hardly ever
attained.
When the model is validated using full
cross-validation, the congruence can be computed both on the complete model or
on the predictions of the samples left out at each step. The choice is made
through a requester:

The default is "Calibration".
A similar choice is to be made when displaying the Q statistic versus one mode
that is not the first when test set validation is employed. In this case it is
asked wheter the residuals shall refer to the calibration or to the test set.
Again the default is "Calibration".
Figure a) shows the congruence for the 15 samples of the Fluorescence data set for the rank 3 model. Sample 8 is not well described by this model and the analysis of the scores and of the concentrations can show how this sample contains only one compound, explicitely the one described by the fourth component, here not present.
Figure b) shows the congruence for the different emission wavelengths, also for the rank 3 model. The lowest emission wavelengths are not well described and this also shows some systematicness as the four factor model (not displayed here) describes these wavelengths much better (although still quite far from congruence 1).
a)![]() |
b)![]() |
D-statistic
The D statistic is the Hotelling
T2 statistic when a reduced spaces with R
components is used instead of x with JK (or JKL, etc) variables; in other
terms it is the Mahalanobis distance of a
certain sample/batch from the origin of the axes in the model space.
It is used as a diagnostic tool especially in the field of Multivariate
Statistics Process Control (MSPC): if the D-statistics for a certain batch is
larger than a threshold determined on the basis of an F-distribution, the batch
is considered a faulty batch (post-batch analysis). The
D-statistics limits (set at 95 and 99%) depend on the number of samples in the
NOC (Normal Operating Conditions) data and on the rank of the model.
1)![]() |
2)![]() |
[1] Nomikos P., Mac Gregor J.F.,
"Monitoring batch process using multiway principal component analysis", AIChE
journal, Vol 40, n°8, 1994, 1361-1373
[2] Westerhuis,J.A.; Gurden,S.P.; Smilde,A.K.,"Generalized
contribution plots in multivariat statistical process monitoring",
Chemometrics and Intelligent Laboratory Systems, Vol 51, 2000, 96-114
Residuals in particular are very powerful diagnostic
tools. Most of the models work under the assumption that the residuals should be
independent and identically distributed, possibly according to a normal
distribution centred in 0.
The presence of systematic variation in the residuals may reflect an
inappropriate choice of the rank or more simply an inadequacy of the model in
explaining the data at hand.
A three
factor model on the fluorescence data (rank 4), yields very systematic
residuals: the model is inadequate.![]() |
A four factor model, on the same
data, yields better predictions and the residuals are relatively
non-systematic. |
| The
Identity Match Plot is available only when the leave one out cross
validation. The scores obtained in prediction (i.e. projecting the left-out sample/sample on the model computed on the others) are plot versus the scores of the complete model. Because of the uniqueness property of PARAFAC the scores should be identical and this plot represents an excellent diagnostic tool for identifying outliers. See literature... A "PARAFAC Plot Control" window is opened (with an empty "Validation frame") to choose the display options and which factor's scores are to be plot. |
![]() |
| The
Resample Influence Plot is currently available for the leave one out
validation case only. Via the calling menu is possible to decide to which mode (apart from the first, which should refer to the sample batches) the plot refers to. It shows the MSE (Mean Squared Error) for the loadings in a specific mode versus the sum of squares of the residuals for the left out sample/batch when this is projected on the model computed on the remaining samples/batches. The samples/batches in the top right corner, yields high residuals and when eliminated lead to very different loadings. This may be a strong indication for these samples/batches to be outliers. More is to be found in the literature. NB. The calculation of the correct MSE requires the solving of an optimisation problem and this procedure may be very expensive. Therefore a requester asks the user if she/he wants to proceed. The same function calculates also the risk. |
![]() |
Risk plot
This plot is available only when resampling methods (leave one out or bootstrap)
have been used to validate the model.
The risk
function for dimension F is defined as:

| Thanks to the uniqueness properties of PARAFAC the models obtained by
leaving out one or more sample should be "identical". The sum of the
congruences between two different models (the rth replicate of leave
one out or bootstrap and the complete model) should be equal to F if
they yield the same factors (provided that the permutational indeterminacy
has been removed). RF-1 is a normalisation factor. The figure on the side shows the risk plot for models computed on the Fluorescence data set with 1 to 5 components. The 5 components model is less "stable" (i.e. the extracted components' loadings vary more depending on the composition of the data set employed to compute the model). It is not visible in this figure, but the risk for 6 components is even higher (~0.038). The minimum is attained for 4 components, which is the correct dimensionality for the problem at hand (Fluorescence data set). |
![]() |
D-statistic on-line
The D-statistic can be computed in an on-line fashion by filling
in the incomplete sample/batch. It is possible then to detect the occurrence of
a fault during the evolution of the batch itself.
There are several options for filling in the batches, CuBatch supports two:
'zero' and 'current
deviation'. In the first one the sample/batch is treated as if it
proceded like the NOC samples/batches, in the second it is assumed that the
difference of the current batch from the NOC samples/batches remains constant
for the rest of the batch. For more information see literature.
The fill-in method is asked every time the plot is
requested in the 'advanced' mode and never
in the 'plant' mode.
The two figures show two possible evolutions, in figure a) no fault occurs, in
figure b) the fault occurs at the very beginning of the batch.
a)![]() |
b)![]() |
|
The Q-statistic can be computed (as well as the D-statistic)
in an on-line fashion by adequately filling in the incomplete batches (see
the D-statistics for more details or the
literature). This plot is started by the menu 'Residuals'->'Mode #: Time'->'On-line' ![]() The evolution of the Q-statistic is displayed versus the time scalars. The confidence limits for the RSS-online are based on the Jackson and Mudholkar work. This plot is available (as the other "on-line" plots) only if the last mode is given name: 'Time' (case insensitive) |
![]() |
The SPE is the Sum of Squares of the
Residuals calculated only at time t.
The evolution of the SPE is plot versus the scalars in the time mode.
The confidence limits are based again on the Jackson and Mudholkar work.
This plot is available (as the other "on-line" plots) only if the last mode is
given name: 'Time' (case insensitive)
![]() |
![]() |