This practical work is based on an initial document written by Boris Hejblum

Load the mixOmics package and liver.toxicity data:

1 PLS

As before, we want to predict the albumine level using gene expression data.

1.1 A first PLS model

Consult the pls() help page, and fit a PLS regression model, first using \(r=10\) (10 latent variables).

1.2 Number of latent variables

How many latent variables do we have to keep ?
[Use the perf() function]

1.3 Individuals representation

Plot the projection of indivuals on the first two latent variables of \(X\) (see plotIndiv() function), and add the information of measurement time and the paracetamol dose (see liver.toxicity$treatment object) on the plot. Comment.

1.4 Variables representation

Plot the links between \(X\) variables and the \(Y\) variable (see the plotVar()) function. Comment.

2 sPLS

As before, we want to predict the albumine level using gene expression data. But, in addition, we want to know what genes are the most related to the albumine level.

2.1 A first PLS model

Consult the spls() help page, and fit an sparse-PLS regression model, first using \(r=10\) (10 latent variables) and keeping \(5\) original variables for each component (see the keepX parameter of the spls() function).

2.2 Number of latent variables

How many latent variables do we have to keep ?
[Use the “Mfold” cross-validation instead of the “loo” one if computational time is too high]
Try to inscrease the number of variables per component and study the effect on the choice of the number of components.
[take for example 20 and then 50 variables per component]

2.3 Individuals representation

Plot the projection of indivuals on the first two latent variables of \(X\), and add the measurement time and the paracetamol dose. Comment.

2.4 Variables representation

Plot the links between \(X\) variables and the \(Y\) variable. Comment.

2.5 Predictions

Predict the learning set observations and compute the empirical error of the PLS predictor. Comment.

2.6 sPLS regression characteristics and norm of loadings vector

Fit the sPLS regression model with the previously chosen parameters.
Print the first coordinates of loadings vectors of \(X\).
Compute \(L_2\) and \(L_1\) norms of the first loading vector of \(X\).
Compare with the \(L_2\) and \(L_1\) norms of the first loading vector obtained when 50 variables are kept.
Compute the covariance between \(Y\) and the first latent variable of \(X\).
Compare with the covariance between \(Y\) and the first latent variable of \(X\) obtained when 50 variables are kept.
Give the names of the genes which have been selected to build the first component.

3 Multivariate analysis

(s)PLS can also be used for multivariate analysis where several \(Y\) are to be explained.
For example, we could do the same analysis as before, but now with all clinic variables in \(Y\) (of dimension 10).