Reference Data sets

Here is a list of some datasets which have been used to test CuBatch . They are available in the sub-repertory "data" in the "CuBatch" directory.

NB: ms is an abbreviation for "multiple set":it means you have to import more than one data in the *.mat object: for this You have to choose multiple set in the submenu import data from *.mat (see "files" help).

bat.mat

contents: B1,..., B29.

missing data: no

Methods: PCA (ms), OPA, IV-PCA (with viscosity.mat), PLS (with viscosity.mat)

Description: 144 NIR spectra from batch process (29 batches at different times)

fluorescence.mat

contents: X,Y

missing data: no.

Methods: Parafac (X),Tucker (X),PCA (X), PLS (X,Y), nPLS (X,Y)

Usemodel: It is suggested to use the "Export data to .mat" and export samples 6 and 10.

Description: 15 solutions containing DOPA, hydroquinone, tyrosine and tryptophan in different amounts (two levels) were examined using a Cary Eclipse fluorescence spectrophotometer. The excitation wavelengths ranged between 230 and 300 nm measured at intervals of 5 nm, the emission was measured at 282 nm to 412 nm with 2 nm steps and the scatter has been removed by subtracting a blank from each sample. 6 replicates of each solution were analysed leading to 6 different arrays of dimensions 15 x 66 x 15.
X contains the fluorescence spectra relative to the first replicate of the data set.
Y contains the concentrations (in coded units, i.e. 0/1) relative to the first replicate of the data set.

Matrice.mat

2D datasets

missing data: no

size :144 x 700

Methods: PCA, OPA, IV-PCA (with visco.mat), PLS (with visco.mat)

Description: 144 NIR spectra from batch process (29 batches at different times)

predclim.mat

contents:

  • X (3D, 12x24x2)
  • Y (3D, 12x24x6)
  • missing data: no.

    Methods: Tucker (X or Y), Parafac, (X or Y) IV-Tucker (X and Y), PCA (X or Y).

    Usemodel:

    Description: X

    VUBdatatest.mat

    contents: X9, X12, X15, X24, X27, spurs.

    Methods: PCA (with one X*), OPA (with one X*), parafac2 (with some X*), OPA-3D (with some X* and spurs).

    For confidentiality reasons, this dataset is not a real dataset.
    5 batches (called X9, X12,...etc) present different lengths (varying from 71 to 143) and a same number of variables (201). The samples can be considered as spectra (even if they look like noise, the information is present and comes from real NIR spectra) while variables can be considered as wavelengths. This dataset is particurlarly suitable for OPA on one single matrix (X15 for instance) and OPA3D. For this later case, spurs can be used as initial inputs of the method (size: 3 x 201).
    The dataset can also be used for PARAFAC2 since "slabs" of the cube are not from equal length and also for PCA.

    Go to files menu