How to implement a new model
Implementing a new model in CuBatch requires the writing of new functions, specific for that model, as well as the editing of some of the existing ones whenever the execution of model-specific code is necessary (e.g. fit a model to a certain data set).
1.
Create the
Model_*
directory
Creating a directory ..\CuBatch\Model_* (where the asterisk stands for the
model name - e.g. Model_OPA, Model_PARAFAC, Model_Tucker,
etc) for all the files relative to the model is not necessary
for CuBatch to work properly, but it is consistent with the existing core of
implemented methods and helps in retrieving the files relative
to them.
This directory shall contain those files, relative to the model, that are called
from a file in the main CuBatch directory, from a file in "..\Cubatch\Common" or
from the main workspace (as in callback functions).
The main Model_* directory will contain a \private directory with
files accessible only from Model_* or Model_*\private.
Again, the existence of a \private directory is not necessary for the main
program to work, but it would be consistent with the rest of the existing code.
The file names in the Model_* directory should start with the model name, which,
for the rest of this document will be symbolised by an asterisk:
PARAFACActivatePlots, TuckerPlot, etc.
The names in the private directory are not relevant as there is no risk of
executing them from the wrong directory; nevertheless for some functions, some
conventions have been used for some of the file's names.
2. Accessing the data
The data is available in five data structures "saved" as application-defined
data in the main window as well as global variables (some of them):
X, cell vector with one or two elements, the first referring to X and the second to Y
Content, cell vector of two elements, with the Content objects of respectively X and Y
ModelIn, structure with the options and the selections for computing a certain model, it contains also the selection on the sample/batches and the variables
ModelOut, structure holding the results of the computation of a model or its application to a new set of data
PlotStruct, structure with the selections for the current plot (if any).
All these can be accessed via the getappdata/setappdata
MatLab functions or alternatively using their extended versions
getcvdata/setcvdata.
X, ModelIn and ModelOut
exist also as global variables and, as such, are accessed by numerous functions
within CuBatch1. The up-to-date versions of these
data structures are still those saved as application-defined data of the main
window and this way of accessing them should be used for new models as well.
Whenever a certain callback function requires to access these data, it is
sufficient to pass the handle of the main figure as an argument or even no
argument at all: the main window is in fact traceable for it has the
'userdata' property defined as
'CuBatch'.
NB Other functions within the GUI can access these data structures and
it is therefore suggested to create copies of them within the function and
"save" them in the main window only when an 'OK' (or equivalent) button is
pressed or no error has occurred.
The sample/batches and variable selection is made available
through the ModelIn.sam and ModelIn.var
fields.
In order to use these selection and extract the proper subset of X (and Y) for
the calculations the function DefineArray
can be used.
Once the model is calculated the ModelOut structure is to
be filled in, particularly important are two fields:
ModelOut.modelname and ModelOut.data.
The ModelOut.info field is a structure with model-specific fields
and it need also be filled in (when requested by other model's functions).
3.
*function files
Eight *function files must be create for
every new model:
*ActivatePlots: generates the menus under the 'Results' menu (having tag 'res') associated to the model currently present in memory (either after calculation or loading). It shall automatically handle the change in menus if the model is applied to a new set of data. The callback function of the 'Results' menus must be the *Plot function of the current model.
*Apply:
handles the application of a model (in the
ModelOut form) to
new data. The loading of the new data is made within this function and can
be executed using the Load_ExtData
function.
The fitting of the existing model to new data is made via the
ApplyModel function, which itself calls the *Pred function.
The results of the projection must be saved in
ModelOut.prediction and
the function must return the updated ModelOut structure.
If the returned ModelOut is empty the procedure will be considered aborted and
the main ModelOut structure will not be updated with the results of the
projection on new data.
*DisplayInfo,
displays the information about the current model in the "InfoBox" (i.e. an
axes object having tag: 'CBInfoFrame'). It must
contain, in the first lines, a call to
DisplayInfo, which takes care of the data array(s) informations.
All the text displayed in the InfoBox shall be given tag 'textfig', so that it
can be removed when an update of the display information is
needed.
*Fit, is a function fitting the desired model to a certain set of data. The
requirements of this function are specified in
FitModel.
There can be more than one of these
functions for the same model depending for instance on the algorithm that one
desires to employ (e.g. for the PARAFAC model there are two: PARAFACFit
and
PARAFACGNFit).
*Model,
creates the GUI handling the choices (e.g. number of factors, which
algorithm to use, options, constraints, etc) directly or via other ad-hoc
figure-opening functions present in the Model_* directory (see for instance
PARAFACConstraints).
The layout of the modelling window is fixed:
The
"Model area" is set in the upper left corner.
The "Validation area" is in a frame in the lower left corner.
The lower right part of the figure is reserved to the compute/cancel
(bootstrap in some methods) actions.
Additional functions that are common to most of the methods (namely
preprocessing and constraints) are handled by buttons located in the upper
right corner.
The 'Preferences' menu is utilised for those options that are model specific
(such as the choice of the algorithm).
The 'Help' menu, on the other hand should have the user-help window for the
implemented model opened in the browser.
The calculation/validation of the model can be performed via the function
FitModel and ValidateModel,
which save the results in the corresponding part of the
ModelOut structure.
NB. The ModelOut.info field is model-dependent and must be defined and
filled within this function (or one of its subroutines).
*Plot: it handles the plotting of the results stored in the ModelOut structure. Currently the different methods employ various approaches. In general the options can be restricted to two:
the *Plot function calls (depending on the tag of the requesting menu), specific plotting functions that are all present in the \private directory (e.g. PARAFAC);
the plot function handles directly the plot, which is chosen via one of the input parameters (e.g. Tucker)
The axes must be given tag 'cbaxes'.
*Pred: function fitting an existing model (stored in ModelOut.model) to a set of data. The requirements for this function are specified in FitModel.
*Report: function saving in a .txt
file the results according to the selections made in the window handled
by SaveReport and passed by means of a
suitable structure. This function accepts as inputs, besides the aformentioned
structure, the handle to the file and the handle to the main figure.
NB the file must not be closed within *Report as this operation is
performed in SaveReport.
The eight files relative to PARAFAC are thoroughly commented and can be used as to clarify one possible organisation of the files.
4. Files to modify
Several files need to be modified in order to fully integrate the new model:
Regress_menu; one (or more) line(s) creating the menu calling the *Model function must be added. The parent object must be GlHan.regress. The tag is used by Dispatch_Compute to call the proper function. The new-menu's handle is to be saved in GlHan (f.i. as GlHan.modelname) so that it can be activated/deactivated upon the loading of compatible data sets.
Decomp_menu; one (or more) line(s) creating the menu calling the *Model function must be added. The parent object must be GlHan.expl. The tag is used by Dispatch_Compute to call the proper function. The new-menu's handle is to be saved in GlHan (f.i. as GlHan.modelname) so that it can be activated/deactivated upon the loading of compatible data sets.
Dispatch_Apply, the case relative to the new model (via the field ModelOut.modelname) must be added as to call the proper *Apply function.
Dispatch_Compute, the case relative to the new model (via the tag of the calling menu) must be added as to call the proper *Model function.
ActivatePlots, upon the loading of suitable data the menu GlHan.modelname must be activated; this is done in the conditional block starting with
if ~isempty(X)
by including the menu handle for the new model in the proper condition blocks.
Also the part relative to ModelOut, i.e. starting withif ~isempty(ModelOut)
%Activate menus depending on the loaded model
if ~isempty(ModelOut(1).modelname),is to be updated including the new model case and allowing the calling of the *ActivatePlots function.
DisplayModel, the case relative to the new model (via ModelOut.modelname) is to be included, allowing the call for the correct *DisplayInfo function.
SaveReport, the case relative to the new model (via ) must be added so that *Report is called. The change must be made within the conditional block starting with the line
switch ModelOut(1).modelname
RemPath/SetPath, a line relative to the Model_* directory is necessary.
DefaultPlot, a case relative to the new model is to be created after:
try
switch mo(1).modelname
where mo is the current ModelOut
ClearUseModel, a case relative to the new model is to be created after:
try
switch mo(1).modelname
where mo is the current
ModelOut
5. Important remarks
The functions
FitModel,
ValidateModel and
ApplyModel are made general and fill in the
corresponding fields in ModelOut. It is of course possible to implement methods
without using these three functions and sometimes it is necessary to do so (read
the next paragraphs).
Among other reasons, their use is nevertheless suggested because they allow the
fitting/validation/projection step to be independent from the definition of the
ModelOut and because once the *Fit and the
*Pred function are defined all the validation
methods (as well as some diagnostics) become automatically available.
The functions
OnLineRes and
Residual_CL are made to be general and
require, as model-dependent input, only the *Pred function.
Thus, defining the *Pred function makes the Q statistics (on-line and off-line) available for
any new model.
The D-Statistic is currently available for three models (PARAFAC, nPLS and
PLS) via the "DLim"s functions in the corresponding private directories. The
same functions can be used for other models so long as the scores are properly
computed. The lack of generality is due to the computation of the
contributions to the D statistic, which differs according to the way the
weights/loadings are defined.
The only current limitation to the generality of the FitModel, ValidateModel, ApplyModel and OnLineRes would be for PARAFAC2-like algorithms, where the loadings in (at least) one of the modes is present as a cell vector. No *Pred function has been created for this purpose, yet. The problem will be addressed in the next releases of the software.
In case the new model requires additional fields in ModelOut.model, ModelOut.validation or ModelOut.prediction, modify their definition in the DefineModelOut function. This will guarantee the compatibility among the different implemented models
6. Additional notes
A function Check (alas InitModelIn) in Model_*\private should check that the contents of ModelIn are compatible with the *Model function.
A function CheckPlotStruct (alas CheckP) in Model_*\private should check that the contents of PlotStruct are compatible with the *Plot function.
New files, specific for the single model and handling a GUI should go in the Model_* directory
New files, specific for the single model and not handling a GUI should go in the Model_*\private directory
New files, used for more than one model must go in the ..\CuBatch\Common directory.
1 The presence of two copies of the same variabels is somewhat suboptimal and this ambiguity is two be removed in the next versions of the program by eliminating the global variables.