ModelOut structure

The ModelOut structure contains all the information relative to a model and the corresponding statistics when any.

The part in common for the different models is created by the function DefineModelOut. It contains the following fields:


data

Structure with three fields:

  1. 'batches': index (in the original array) of the batches/samples (or in general horizontal slabs) that have been employed in the calculation of the model
  2. 'dataset': cbdataset object or cell vector (to be changed and unified) of cbdataset objects.
    If the employed method used only one data matrix (i.e. it is a decomposition method) it contains the data set that was used for the computation (not the excluded samples). When a regression model is treated, it is a cell vector of length two where the first element is the X array (in the form of a cbdataset object) and the second the Y (also in the form of a cbdataset object)
  3. 'variables': a cell vector of doubles or a cell vector of cell vectors of doubles (to be changed and unified as well). If it is a cell vector the n-th element of this vector contain the indexes in the n-th mode of the original X array of the variables (i.e. any sort of "vertical" slabs in a three way array - frontal and lateral) used to compute the model.

Note In future versions of CuBatch this whole field may become obsolete as the both the ModelIn data structure and the X data structure may be kept within a Model file and the distinction between Models and Sessions technically abandoned.


info

This structure is model dependent and, apart from the field 'algorithm', is defined within the *Model functions or equivalents.
The 'algorithm' field contains the additional information associated to the computed model. Its content also are set by the *Model function responsible for handling the computation of the model.

Click on the corresponding link to check the content of the different 'info' structures


model

Contains the model's parameters as well as some statistics and quality parameters associated to it.
This field is a structure or a vector of structures (for some models such as PARAFAC and (n)PLS1).
In the latter case each element is associated to a different dimensionality of the model.

Fieldname

Content

Class

bcoeff   

Regression coefficients

array of doubles

core     

Core array

array of doubles

hotelling

Obsolete?

Obsolete?

nbfactors

number of factors/components/latent variables of the model

double

stats    

statistics associated to the model.§

structure

xcumpress

X CUMulated Prediction REsidual Sum of Squares (also known as RSS)

double

xev      

X % explained variation

double

xfactors 

loading matrices for the X array

cell vector of doubles.In most of the models the number of elements correspond to the number of dimensions of the X array

xpred    

predictions for the X array

array of doubles of the same dimensions of the X used to compute the model.

xpreproc 

X preprocessing parameters' structure with two fields: 'cen' and 'scal'. The first refers to centring and the second to scaling.
Their contents are directly obtained via the nprocess function of the N-way toolbox.

The two fields are cell vectors where the n-th element refer to the preprocessing in the n-th mode.
In case no preprocessing has been performed it is a cell vector of empty arrays.
 

xpress   

X horizontal slabs (i.e. samples/batches) Prediction REsidual Sum of Squares

vector of double. The n-th element refers to the n-th slab

xrmse    

X Root Mean Squared Error. It is computed over the existing (non-missing, that is) values only

double

ycumpress

Y CUMulated Prediction REsidual Sum of Squares (also known as RSS)

double

yev      

Y % explained variation

double

yfactors 

loading matrices for the Y array

cell vector of doubles.
In most of the models the number of elements correspond to the number of dimensions of the Y array

ypred    

predictions for the Y array

array of doubles of the same dimensions of the Y used to compute the model.

ypreproc 

Y preprocessing parameters' structure with two fields: 'cen' and 'scal'. The first refers to centring and the second to scaling.
Their contents are directly obtained via the nprocess function of the N-way toolbox.

The two fields are cell vectors where the n-th element refer to the preprocessing in the n-th mode.
In case no preprocessing has been performed it is a cell vector of empty arrays.
 

ypress   

Y horizontal slabs (i.e. samples/batches) Prediction REsidual Sum of Squares

vector of double. The n-th element refers to the n-th slab

yrmse 

Y Root Mean Squared Error. It is computed over the existing (non-missing, that is) values only

double

 

§ The fields of this structure are to be defined clearly and once and for all at the meeting taking place in Brussels on the 23 Oct. 2002


modelname

String with the model's name. The values admitted are the same as the Available_Models variable defined in AvModels


plot

Vector of PlotStruct structures of the type.
Its final function has been fully implemented yet. Each of the elements should represent a possible plot available in the plant mode as defined by an advanced user.


prediction

Contains the predicted values for a new set of samples plus some additional information and statistics.
The fields of this structure  and their contents are the following:

Fieldname

Content

Class

core     

Core array Obsolete?

array of doubles

data

Cbdataset with the set of data upon which the model is applied

Cbdataset (double)?

nbfactors

number of factors variables of the model

double.

stats    

Statistics associated to the predictions.§

structure.

xcumpress

X CUMulated Prediction REsidual Sum of Squares

double.

xev      

X % explained variation in prediction (Q2)

double

xfactors 

Predicted X scores matrix

cell element containing an array of double.

xpred    

Predictions for the X array

array of doubles of the same dimensions of the 'data' field.

xpress   

X horizontal slabs (i.e. samples/batches) Prediction REsidual Sum of Squares

vector of double. Each element refers to one sample.

xrmse    

X Root Mean Squared Error. It is computed over the existing (non-missing, that is) values only

double

yfactors 

Predicted Y scores.

cell element containing an array of double.

ypred    

Predictions for the Y array

array of doubles.

 


validation

Contains the model's parameters as determined during the validation procedure.
It also contains some statistics and quality parameters depending on the validation.
This field is a structure or a vector of structures (for some models such as PARAFAC and (n)PLS1).
In the latter case each element is associated to a different dimensionality of the model.
The content of the fields may vary according to the employed method.

Fieldname

Content

Class

Ext. §§

bcoeff   

Regression coefficients

array of doubles

Yes

core     

Core array

array of doubles

Yes

method

Four types of validation are currently available (Table 1)

  1. Leave One Out
  2. Naïve Bootstrap
  3. Residuals Bootstrap
  4. Test Set validation 

string of chars

  1. 'loo'
  2. 'nboo'
  3. 'rboo'
  4. 'test'

 

No

nbfactors

number of factors/components/latent variables of the model

double

No

segments

Its content depend on the methods. For 1.- 3. each column contains the indexes of the samples/batches employed for a certain sub-model.
I.e.

'loo': the number of columns of this field is equal to the number of samples. In the nth column contains the values 1 to n and n + 1 (if it exist) to I (i.e. the number of samples)

'*boo': the number of columns is equal to the number of replicates.

'test': one column vector with the samples included in the test set (only).
 

array or vector of doubles

No

stats    

statistics associated to the model.§

structure

No

xcumpress

X CUMulated Prediction REsidual Sum of Squares (also known as RSS)

double

No

xev      

X % explained variation

double

No

xfactors 

loading matrices for the X array

cell vector of doubles.

In most of the models the number of elements
correspond to the number of dimensions of the X array

Yes¤

xpred    

predictions for the X array

array of doubles of the same dimensions of the X used to compute the model.

No

xpress   

X horizontal slabs (i.e. samples/batches) Prediction REsidual Sum of Squares

vector of double. Each element refers to one segment.

No

xrmse    

X Root Mean Squared Error. It is computed over the existing (non-missing, that is) values only

double

No

ycumpress

Y CUMulated Prediction REsidual Sum of Squares (also known as RSS)

double

No

yev      

Y % explained variation

double

No

yfactors 

loading matrices for the Y array

cell vector of doubles.
In most of the models the number of

elements correspond to the number of dimensions of the Y array

Yes¤

ypred    

predictions for the Y array

array of doubles of the same dimensions of the Y used to compute the model.

Yes¤

ypress   

Y horizontal slabs (i.e. samples/batches) Prediction REsidual Sum of Squares

vector of double. The n-th element refers to the n-th slab

No

yrmse 

Y Root Mean Squared Error. It is computed over the existing (non-missing, that is) values only

double

No

 

§ The fields of this structure are to be defined clearly and once and for all at the meeting taking place in Brussels on the 23 Oct. 2002

§§ When a resampling method is used (i.e. Leave One Out or Bootstrap) one mode is added to the double arrays of this field (when it is a cell this is valid for each elements).

E.g.

¤In 'loo' a further slab is added in the third dimension in the first mode (e.g the scores or the predictions for y) containing the predictions for the left out samples.