nPLS1 is a
regression model that uses an n-way array to predict a single vector. It
represents the extension to the multi-way case of the more spread PLS1 algorithm
(see literature).
Several plots may be of interest, depending on the purpose
of the analysis.
Some of them are available both for X and for the Y
while others are specific for the X:
or for the Y:
Some other plots become available only in specific cases:
The number of models computed and
validated depends on the number of Y variables that was originally selected.
The choice of which predicted variable one desires to look at is done directly
in the 'Results' menu (red mark).
Whenever the variable name is not in the plot's title, it appears in the InfoBox
as the name of the Y data.

The choice between X and Y plots can be
made in the following submenu (yellow).
The subsequent levels of menus are those that allow the actual choice of the
type of plot.
| NB
Some plots need the number of latent variables to be
specified (e.g. D-statistic). This is
done via the requester shown on the side. Whenever projections are presents the corresponding results are given precedence with respect to the calibration (which can be displayed, as an option) and the validation. |
![]() |
Weights and scores can be displayed in up to tridimensional plots (1D, 2D or 3D). If the number of LV is insufficient for a certain plot the submenu is inactive.

After the desired menu is selected, an
"nPLS1 Plot Control"
window opens:

The "Axes" frame
It allows to choose which LV, if any, is to be plot along which axis:
![]() |
![]() |
The "Validation & prediction" frame
Its content varies depending on the type of validation (if any) or if the model is applied on external data :
Choosing to plot replicates/predictions
deactivate the "All" option in the 1D plots.
NB Due to the intrinsic rotational indeterminacy the
replicates normally form an "arc" centred in the origin of the axes and do
not "cluster" around a specific point. The possibility of displaying them was
maintained as it is expected that the suitable rotation to the same model will
be implemented in future versions of the software.
The "Display options" frame and the preferences menu
The first menu in the frame allows to choose the type of marker:
The second menu specify whether the a line should link the points (Continuous) or not (Discrete)
The submenus in the 'Preferences' menu are activated/deactivated depending on the choices made on the main control window:
|
'Show model' ->
'labels' with replicates displayed as green points, X scores ![]() |
'Show model' -> 'no'
with replicates displayed as scalars, Y scores ![]() |
![]() |
The green hulls are relative to the replicates. The black labels represent the final model's scores. |
The other submenus are:
Before the changes become visible it is necessary to press the button 'Plot'.
Explained
Variation, PRESS, RMSE
Explained
Variation (expressed as a % of
the total variation in the set), Prediction REsiduals Sum of Squares
and Root Mean Squared Error reflect all the goodness of the fit, in
particular:![]() where the PRESS refers to the calibration or to the validation samples/batches and TSS stands for Total Sum of Squares. The RMSE is linked to the PRESS by the relation: ![]() where n is the number of "non-missing" elements in the array. They give then the same type of information, albeit in different measurement units. This plots are available both for the X and the y. The plots on the X are defined also when new data is projected on a model. For obvious reasons these values are not available for the y. To plot the overall explained variation the following menu has to be selected
The other submenus will have the EV% plotted against a specific axis (see figure on the side). Subsets can be chosen, as for the residual sum of squares. When the model is validated using
full cross-validation, the EV can be computed both on the complete model or
on the predictions of the samples left out at each step. The choice is made
through a requester that appears when subsets are
selected in the first mode: |
![]() |
![]() |
|
![]() |
Residuals
sum of squares (Q-statistics)
This plot involves many
selections, besides the choice of the model's rank:
![]() 1) Select mode to plot against in the 'Results' ->'Var. name' -> 'X plots' -> 'Residuals' menu
|
2) Choose
subsets in various modes
|
When the model is validated using full
cross-validation, the residuals can be computed both on the complete model or
on the predictions of the samples left out at each step. The choice is made
through a requester that appears when subsets are
selected in the first mode:

The default is "Calibration".
A similar choice is to be made when displaying the Q statistic versus one mode
that is not the first when test set validation is employed. In this case it is
asked wheter the residuals shall refer to the calibration or to the test set.
Again the default is "Calibration".
If more than one sample is selected and the desired mode is
not the first, one plot per sample is displayed. It is possible to see the
sample label by clicking on the plot with the left mouse button.
The
confidence limits for the Q-statistic are computed on the basis of Jackson and
Mudholkar and displayed as light blue
solid/dashed lines. The NOC batches/samples
(the calibration set) are used to compute these limits.
NB It must be kept in mind that the residuals in
the (n)PLS case are unlikely to be normally distributed.
Although the confidence limits are
computed on the base of the first moments of the residuals and so this is
partially accounted, their
meaningfulness is to be deemed very carefully.
![]() |
Note: if subsets are selected in modes other than the first the non-normalised residuals plot actually displays the contribution plot to the Q-statistic.
Slab-wise congruence
The congruence (i.e. the cosine) between the real data and the model is plotted
versus the scalars in the selected modes.
A value of 1 of the congruence means that the corresponding slab is perfectly
recovered by the model, although, due to the noise, this limit is hardly ever
attained.
When the model is validated using full
cross-validation, the congruence can be computed both on the complete model or
on the predictions of the samples left out at each step. The choice is made
through a requester:

Figure
a) and b) show the congruence for the 66 emission wavelengths of the
Fluorescence data set in a 4 LV model for variables 'DOPA'
and 'Tyro'.
When using nPLS1, different models can be obtained on the same X depending on
the predicted variable. It is not surprising then, that some systematicness
(likely connected to other compounds) is left in the residuals (figure b) and
that the various compounds lead to different congruences profiles (here for the
emission mode).
a)
'DOPA'![]() |
b)
'Tyro'![]() |
D-statistic
The D statistic is the Hotelling
T2 statistic when a reduced spaces with R
components is used instead of x with JK (or JKL, etc) variables; in other
terms it is the Mahalanobis distance of a
certain sample/batch from the origin of the axes in the model space.
It is used as a diagnostic tool especially in the field of Multivariate
Statistics Process Control (MSPC): if the D-statistics for a certain batch is
larger than a threshold determined on the basis of an F-distribution the batch
is considered a faulty batch (post-batch analysis). The
D-statistics limits (set at 95 and 99%) depend on the number of samples in the
NOC (Normal Operating Conditions) data and on the rank of the model.
1)![]() |
2)![]() |
[1] Nomikos P., Mac Gregor J.F.,
"Monitoring batch process using multiway principal component analysis", AIChE
journal, Vol 40, n°8, 1994, 1361-1373
[2] Westerhuis,J.A.; Gurden,S.P.; Smilde,A.K.,
"Generalized contribution plots in multivariate statistical process monitoring",
Chemometrics and Intelligent Laboratory Systems, Vol 51, 2000, 96-114
[3] Nomikos P., Mac Gregor J.F., "Multi-way partial least squares in monitoring
batch process", Chemometrics and Intelligent Laboratory Systems, Vol 30, 1995,
97-108
Residuals in particular are very powerful diagnostic
tools. Most of the models work under the assumption that the residuals should be independent and identically distributed, possibly according to a normal
distribution centred in 0.
The presence of systematic variation in the residuals may reflect an
inappropriate choice of the rank or more simply an inadequacy of the model in
explaining the data at hand. As nPLS1 do not maximise the
explained variation on the X array the residuals can retain some of the
systematic variation making the use of the residuals somewhat more difficult.
A three
LV model on the fluorescence data (rank 4)
for predicting 'Tyro' yields very systematic
residuals.![]() |
A four LV model on the same
data and predicting 'Tyro', yields better predictions;
the residuals are relatively
non-systematic of much smaller
magnitude.
|
Predicted vs
Measured
This plots shows the predicted values versus the measured ones.
The display options and the number of LVs to use can be chosen via the
"nPLS1 Plot Control" window that opens after
clicking on the menu.
Replicates and predictions can also be plotted when available.
This plot is not available when new data is projected on the model.
The green bisecting line represents the optimality, i.e. the predictions are
equal to the measured values.
The black labels always identify the model, while the red ones locate the
predictions after the leave one out procedure (like in the figure) or the
replicate from any resampling method.

t vs u
nPLS (as well as PLS) regression coefficient define a linear relationship
between the scores t of the X array and the scores u of the
Y array (for (n)PLS1 the y vector).
The "t vs u" can help in determining the correct number of components: when the
correlation between predictors and predictands becomes random (e.g. the t vs u
resembles a scatter-shot) there is likely "no model" between X and y.
Thus it may be better to use less LVs then the one displayed in the plot as to
avoid overfitting.
The display options are available via the standard "nPLS1 Plot Control" window.
a) there is an
evident correlation between the scores in X and those in y![]() |
b) the us seem
to vary independently from the ts. The linear model (represented by the
line) does not capture any systematic variation: 6 LVs are too many.![]() |
Predictions plot
It is equivalent to the
predicted vs measured plot with the predictions for the new data projected
on the model on the diagonal (i.e. the predictions are, for these values only,
used on the x-axis as well).
D-statistic on-line
The D-statistic can be computed in an on-line fashion by filling
in the incomplete sample/batch. It is possible then to detect the occurrence of
a fault during the evolution of the batch itself.
There are several options for filling in the batches, CuBatch supports two:
'zero' and 'current
deviation'. In the first one the sample/batch is treated as if it
proceded like the NOC samples/batches, in the second it is assumed that the
difference of the current batch from the NOC samples/batches remains constant
for the rest of the batch. For more info see literature.
The fill-in method is asked every time the plot is
requested in the 'advanced' mode and never
in the 'plant' mode.
The two figures show two possible evolutions, in figure a) no fault occurs, in
figure b) the fault occurs at the very beginning of the batch.
a)![]() |
b)![]() |
|
The Q-statistic can be computed (as well as the D-statistic)
in an on-line fashion by adequately filling in the incomplete batches (see
the D-statistics for more details or the
literature). This plot is started by the 'On-line' menu indicated by the light blu circle. ![]() The evolution of the Q-statistic is displayed versus the time scalars. The confidence limits for the RSS-online are based on the Jackson and Mudholkar work. This plot is available (as the other "on-line" plots) only if the last mode is given name: 'Time' (case insensitive) |
![]() |
The SPE is the Sum of Squares of the
Residuals calculated only at time t.
The evolution of the SPE is plot versus the scalars in the time mode.
The confidence limits are based again on the Jackson and Mudholkar work.
This plot is available (as the other "on-line" plots) only if the last mode is
given name: 'Time' (case insensitive)
![]() |
![]() |