This practical work is based on an initial document written by Boris Hejblum

Load the mixOmics package and liver.toxicity data:

library(mixOmics)
data("liver.toxicity")

As before, we want to predict the albumine level using gene expression data.

  1. A first PLS model
    Consult the pls() help page, and fit a PLS regression model, first using \(r=10\) (10 latent variables).
X <- as.matrix(liver.toxicity$gene)
Y <- liver.toxicity$clinic$ALB.g.dL.
n <- nrow(X)
n
## [1] 64
p <- ncol(X)
p
## [1] 3116
r <- 10
pls_albu <- pls(X, Y, ncomp = r, mode = "regression")
  1. Number of latent variables
    How many latent variables do we have to keep ?
    [Use the perf() function]
CV_albu <- perf(pls_albu, validation = "Mfold")
plot(x = 1:r, y = CV_albu$MSEP, col = "blue", lwd = 2, xlab = "Nb Comp", ylab = "10-fold CV error", 
    type = "b", main = "Evolution the cross-validation error against the number or component")

# If we optimize the prediction error, we choose 9 components, but the
# interpretation will be quite difficult.  We take 3 components in the
# following: it eases the interpretation and the prediction error is quite
# competitive.
  1. PLS regression characteristics
    Fit the PLS regression model with the previously chosen number of latent variables.
    What are the explained variance proportions.
pls_albu_opt <- pls(X, Y, ncomp = 3, mode = "regression")
pls_albu_opt$explained_variance
## $X
##    comp 1    comp 2    comp 3 
## 0.1107910 0.1401058 0.2171452 
## 
## $Y
##    comp 1    comp 2    comp 3 
## 1.0000000 0.4716469 0.3574486
  1. Individuals representation
    Plot the projection of indivuals on the first two latent variables of \(X\) (see plotIndiv() function), and add the information of measurement time and the paracetamol dose (see liver.toxicity$treatment object) on the plot. Comment.
# Dose:
plotIndiv(pls_albu_opt, comp = 1:2, ind.names = FALSE, rep.space = "X-variate", 
    group = liver.toxicity$treatment$Dose.Group, legend = TRUE, legend.title = "Dose")

# Time of measurement:
plotIndiv(pls_albu_opt, comp = 1:2, ind.names = FALSE, rep.space = "X-variate", 
    group = liver.toxicity$treatment$Time.Group, legend = TRUE, legend.title = "Time")

# Both information on the same graph:
plotIndiv(pls_albu_opt, comp = 1:2, ind.names = FALSE, pch = liver.toxicity$treatment$Time.Group, 
    rep.space = "X-variate", group = liver.toxicity$treatment$Dose.Group, legend = TRUE, 
    legend.title = "Dose", legend.title.pch = "Time")