1 Random Forests basics

Load the dataset liver.toxicity available in the mixOmics package:

# install.packages('mixOmics')
x <- liver.toxicity$gene
y <- liver.toxicity$clinic$ALB.g.dL.
x[1:6, 1:6]
  1. Load the randomForest package and consult the help page of the randomForest() function:


    Build, with the randomForest() function, a Bagging predictor, named bag, which involves maximal trees. Keep all default parameters for now.
    Determine its OOB error using the output print.

  2. Build now a random forests predictor, named rf, which randomly selects \(p/3\) (the default value) variables at each node before optimizing the split.
    Compare its OOB error with the previous one.
    Apply the plot() function to the object rf.

  3. Tune the number of trees in the forest, while letting the number of variables selected at each node fixed to its default value.
    Use the previous plot to guide your choice of tested values.

  4. Tune now the number of variables selected at each node, while letting the number of trees to its value found in question 3.

  5. Build a random forest predictor with the following parameters values:

    (replace = FALSE, sampsize = nrow(x), mtry = ncol(x), ntree = 10, maxnodes = 5)

    What are the caracteristics of this RF ?
    Look carefully at the forest component of the resulting object (which is a list) and figure out what its content means.

2 Prediction error estimate using cross-validation

The goal in this section is to estimate the prediction error of a RF and a regression tree on Liver toxicity data using 10-fold cross validation.

  1. Randomly choose which observation belongs to each fold of the 10-fold cross-validation.

  2. Build a RF and a CART tree on all Liver tocixity data except those belonging to the first fold.

  3. Predict the observations belonging to the firest fold with both predictors (use the predict() method for that).

  4. Compute the 10-fold cross validation error estimate for the RF and the CART tree. Comment.