Cross Validation Techniques¶

Holdout Method¶

K-fold Cross-Validation¶

The idea is to split the training set into \(k\) folds, or splits, and with no replacement. \(k-1\) folds are used for training; one fold is used for evaluation.

The process is performed \(k\) times; the result is \(k\) models with \(k\) performance estimates.
Then, the average performance of the models is calculated. This performance estimate is less sensitive to sub-partitioning of training data.
When good hyperparameter values are found, we retrain the model on the whole training dataset, yielding a final performance estimate by evaluating the independent test set.
One consequence of this is that each example will be used for training and validation (as a part of a test fold) exactly once. This generally yields a lower-variance estimate of model performance than the holdout method.
Studies have shown that k = 10 folds generally offers the best tradeoff between bias and variance.

Stratified k-Fold CV¶

import numpy as np
from sklearn.model_selection import StratifiedKFold

kfold = StratifiedKFold(n_splits=10).split(X_train, y_train)

scores = []
for k, (train, test) in enumerate(kfold):
    pipe_lr.fit(X_train[train], y_train[train])
    score = pipe_lr.score(X_train[test], y_train[test])
    scores.append(score)
    print('Fold: %2d, Class dist.: %s, Acc: %.3f' % (k+1, np.bincount(y_train[train]), score))

print('\nCV accuracy: %.3f +/- %.3f' % (np.mean(scores), np.std(scores)))

Fold:  1, Class dist.: [256 153], Acc: 0.935
Fold:  2, Class dist.: [256 153], Acc: 0.935
Fold:  3, Class dist.: [256 153], Acc: 0.957
Fold:  4, Class dist.: [256 153], Acc: 0.957
Fold:  5, Class dist.: [256 153], Acc: 0.935
Fold:  6, Class dist.: [257 153], Acc: 0.956
Fold:  7, Class dist.: [257 153], Acc: 0.978
Fold:  8, Class dist.: [257 153], Acc: 0.933
Fold:  9, Class dist.: [257 153], Acc: 0.956
Fold: 10, Class dist.: [257 153], Acc: 0.956

CV accuracy: 0.950 +/- 0.014

k-fold cross-validation scorer¶

less verbose evaluation
Allows us to distribute the evaluation of the different folds across different CPU cores on the machine.
- n_jobs = 2 -> 2 cores to balance the fold scoring, for example;
- n_jobs = -1 -> all available CPUs can be used in parallel.

from sklearn.model_selection import cross_val_score

scores = cross_val_score(estimator=pipe_lr,
                         X=X_train,
                         y=y_train,
                         cv=10,
                         n_jobs=1)

print('CV accuracy scores: %s' % scores.reshape(5,2))
print('CV accuracy: %.3f +/- %.3f' % (np.mean(scores),
                                      np.std(scores)))

CV accuracy scores: [[0.93478261 0.93478261]
 [0.95652174 0.95652174]
 [0.93478261 0.95555556]
 [0.97777778 0.93333333]
 [0.95555556 0.95555556]]
CV accuracy: 0.950 +/- 0.014

Learning Curves¶

Learning Curve Implemented

Plot of model training and validation accuracy

Type	Traits	Common Fixes
High Bias	Low training and cross-validation accuracy, underfitting	Raise number of parameters (add'l features); lower degree of regularization (SVMs), logistic regression classifiers
High Variance	Large gap between training and cross-validation accuracies, overfitting	Get more data, reduce model complexity, increase regularization parameter, etc.

Validation Curves¶

Validation Curve Implemented

Instead of plotting training and test accuracies, validation curves vary the values of model parameters, like the inverse regularization parameter, C, in logistic regression.