This practical work is based on an initial document written by Boris Hejblum

Load the mixOmics package and liver.toxicity data:


As before, we want to predict the albumine level using gene expression data. Furthermore, we want to know what genes are the most related to the albumine level.

  1. A first PLS model
    Consult the spls() help page, and fit an sparse-PLS regression model, first using \(r=10\) (10 latent variables) and keeping \(5\) original variables for each component (see the keepX parameter of the spls() function).

  2. Number of latent variables
    How many latent variables do we have to keep ?
    [Use the “Mfold” cross-validation instead of the “loo” one if computational time is too high]
    Try to inscrease the number of variables per component and study the effect on the choice of the number of components.
    [take for exemple 20 and then 50 variables per component]

  3. sPLS regression characteristics and norm of loadings vector
    Fit the sPLS regression model with the previously chosen parameters.
    Look at the explained variance proportions.
    Print the first coordinates of loadings vectors of \(X\).
    Compute \(L_2\) and \(L_1\) norms of the first loading vector of \(X\).
    Compare with the \(L_2\) and \(L_1\) norms of the first loading vector obtained when 50 variables are kept.
    Compute the covariance between \(Y\) and the first latent variable of \(X\).
    Compare with the covariance between \(Y\) and the first latent variable of \(X\) obtained when 50 variables are kept.
    Give the names of the genes which have been selected to build the first component.

  4. Individuals representation
    Plot the projection of indivuals on the first two latent variables of \(X\), and add the measurement time and the paracetamol dose. Comment.

  5. Variables representation
    Plot the links between \(X\) variables and the \(Y\) variable. Comment.

  6. Predictions
    Predict the learning set observations and compute the empirical error of the PLS predictor. Comment.

  7. Multivariate analysis
    (s)PLS can also be used for multivariate analysis where several \(Y\) are to be explained.
    For example, we could do the same analysis as before, but now with all clinic variables in \(Y\) (of dimension 10).