Skip to content

Cross Validation Techniques

Holdout Method

K-fold Cross-Validation

The idea is to split the training set into \(k\) folds, or splits, and with no replacement. \(k-1\) folds are used for training; one fold is used for evaluation.

  • The process is performed \(k\) times; the result is \(k\) models with \(k\) performance estimates.

  • Then, the average performance of the models is calculated. This performance estimate is less sensitive to sub-partitioning of training data.

  • When good hyperparameter values are found, we retrain the model on the whole training dataset, yielding a final performance estimate by evaluating the independent test set.

  • One consequence of this is that each example will be used for training and validation (as a part of a test fold) exactly once. This generally yields a lower-variance estimate of model performance than the holdout method.

  • Studies have shown that k = 10 folds generally offers the best tradeoff between bias and variance.

Stratified k-Fold CV

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import numpy as np
from sklearn.model_selection import StratifiedKFold

kfold = StratifiedKFold(n_splits=10).split(X_train, y_train)

scores = []
for k, (train, test) in enumerate(kfold):
    pipe_lr.fit(X_train[train], y_train[train])
    score = pipe_lr.score(X_train[test], y_train[test])
    scores.append(score)
    print('Fold: %2d, Class dist.: %s, Acc: %.3f' % (k+1, np.bincount(y_train[train]), score))

print('\nCV accuracy: %.3f +/- %.3f' % (np.mean(scores), np.std(scores)))
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
Fold:  1, Class dist.: [256 153], Acc: 0.935
Fold:  2, Class dist.: [256 153], Acc: 0.935
Fold:  3, Class dist.: [256 153], Acc: 0.957
Fold:  4, Class dist.: [256 153], Acc: 0.957
Fold:  5, Class dist.: [256 153], Acc: 0.935
Fold:  6, Class dist.: [257 153], Acc: 0.956
Fold:  7, Class dist.: [257 153], Acc: 0.978
Fold:  8, Class dist.: [257 153], Acc: 0.933
Fold:  9, Class dist.: [257 153], Acc: 0.956
Fold: 10, Class dist.: [257 153], Acc: 0.956

CV accuracy: 0.950 +/- 0.014

k-fold cross-validation scorer

  • less verbose evaluation
  • Allows us to distribute the evaluation of the different folds across different CPU cores on the machine.

    • n_jobs = 2 -> 2 cores to balance the fold scoring, for example;
    • n_jobs = -1 -> all available CPUs can be used in parallel.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from sklearn.model_selection import cross_val_score

scores = cross_val_score(estimator=pipe_lr,
                         X=X_train,
                         y=y_train,
                         cv=10,
                         n_jobs=1)

print('CV accuracy scores: %s' % scores.reshape(5,2))
print('CV accuracy: %.3f +/- %.3f' % (np.mean(scores),
                                      np.std(scores)))
1
2
3
4
5
6
CV accuracy scores: [[0.93478261 0.93478261]
 [0.95652174 0.95652174]
 [0.93478261 0.95555556]
 [0.97777778 0.93333333]
 [0.95555556 0.95555556]]
CV accuracy: 0.950 +/- 0.014

Learning Curves

Learning Curve Implemented

  • Plot of model training and validation accuracy

Type Traits Common Fixes
High Bias Low training and cross-validation accuracy, underfitting Raise number of parameters (add'l features); lower degree of regularization (SVMs), logistic regression classifiers
High Variance Large gap between training and cross-validation accuracies, overfitting Get more data, reduce model complexity, increase regularization parameter, etc.

Validation Curves

Validation Curve Implemented

  • Instead of plotting training and test accuracies, validation curves vary the values of model parameters, like the inverse regularization parameter, C, in logistic regression.