*This practical work is based on an initial document written by Boris Hejblum*

Load the `mixOmics`

package and `liver.toxicity`

data:

```
library(mixOmics)
data("liver.toxicity")
```

As before, we want to predict the albumine level using gene expression data.

- A first PLS model

Consult the`pls()`

help page, and fit a PLS regression model, first using \(r=10\) (10 latent variables).

```
X <- as.matrix(liver.toxicity$gene)
Y <- liver.toxicity$clinic$ALB.g.dL.
n <- nrow(X)
n
```

`## [1] 64`

```
p <- ncol(X)
p
```

`## [1] 3116`

```
r <- 10
pls_albu <- pls(X, Y, ncomp = r, mode = "regression")
```

- Number of latent variables

How many latent variables do we have to keep ?

*[Use the*`perf()`

function]

`CV_albu <- perf(pls_albu, validation = "Mfold")`

```
plot(x = 1:r, y = CV_albu$MSEP, col = "blue", lwd = 2, xlab = "Nb Comp", ylab = "10-fold CV error",
type = "b", main = "Evolution the cross-validation error against the number or component")
```

```
# If we optimize the prediction error, we choose 9 components, but the
# interpretation will be quite difficult. We take 3 components in the
# following: it eases the interpretation and the prediction error is quite
# competitive.
```

- PLS regression characteristics

Fit the PLS regression model with the previously chosen number of latent variables.

What are the explained variance proportions.

```
pls_albu_opt <- pls(X, Y, ncomp = 3, mode = "regression")
pls_albu_opt$explained_variance
```

```
## $X
## comp 1 comp 2 comp 3
## 0.1107910 0.1401058 0.2171452
##
## $Y
## comp 1 comp 2 comp 3
## 1.0000000 0.4716469 0.3574486
```

- Individuals representation

Plot the projection of indivuals on the first two latent variables of \(X\) (see`plotIndiv()`

function), and add the information of measurement time and the paracetamol dose (see`liver.toxicity$treatment`

object) on the plot. Comment.

```
# Dose:
plotIndiv(pls_albu_opt, comp = 1:2, ind.names = FALSE, rep.space = "X-variate",
group = liver.toxicity$treatment$Dose.Group, legend = TRUE, legend.title = "Dose")
```

```
# Time of measurement:
plotIndiv(pls_albu_opt, comp = 1:2, ind.names = FALSE, rep.space = "X-variate",
group = liver.toxicity$treatment$Time.Group, legend = TRUE, legend.title = "Time")
```

```
# Both information on the same graph:
plotIndiv(pls_albu_opt, comp = 1:2, ind.names = FALSE, pch = liver.toxicity$treatment$Time.Group,
rep.space = "X-variate", group = liver.toxicity$treatment$Dose.Group, legend = TRUE,
legend.title = "Dose", legend.title.pch = "Time")
```