*This practical work is based on an initial document written by Boris Hejblum*

Load the `mixOmics`

package and `liver.toxicity`

data:

As before, we want to predict the albumine level using gene expression data.

Consult the `pls()`

help page, and fit a PLS regression model, first using \(r=10\) (10 latent variables).

How many latent variables do we have to keep ?

*[Use the perf() function]*

Plot the projection of indivuals on the first two latent variables of \(X\) (see `plotIndiv()`

function), and add the information of measurement time and the paracetamol dose (see `liver.toxicity$treatment`

object) on the plot. Comment.

Plot the links between \(X\) variables and the \(Y\) variable (see the `plotVar()`

) function. Comment.

As before, we want to predict the albumine level using gene expression data. But, in addition, we want to know what genes are the most related to the albumine level.

Consult the `spls()`

help page, and fit an sparse-PLS regression model, first using \(r=10\) (10 latent variables) and keeping \(5\) original variables for each component (see the `keepX`

parameter of the `spls()`

function).

How many latent variables do we have to keep ?

*[Use the “Mfold” cross-validation instead of the “loo” one if computational time is too high]*

Try to inscrease the number of variables per component and study the effect on the choice of the number of components.

*[take for example 20 and then 50 variables per component]*

Plot the projection of indivuals on the first two latent variables of \(X\), and add the measurement time and the paracetamol dose. Comment.

Plot the links between \(X\) variables and the \(Y\) variable. Comment.

Predict the learning set observations and compute the empirical error of the PLS predictor. Comment.

Fit the sPLS regression model with the previously chosen parameters.

Print the first coordinates of loadings vectors of \(X\).

Compute \(L_2\) and \(L_1\) norms of the first loading vector of \(X\).

Compare with the \(L_2\) and \(L_1\) norms of the first loading vector obtained when 50 variables are kept.

Compute the covariance between \(Y\) and the first latent variable of \(X\).

Compare with the covariance between \(Y\) and the first latent variable of \(X\) obtained when 50 variables are kept.

Give the names of the genes which have been selected to build the first component.

(s)PLS can also be used for multivariate analysis where several \(Y\) are to be explained.

For example, we could do the same analysis as before, but now with all clinic variables in \(Y\) (of dimension 10).