## Information criteria for choosing best predictive models

29th May 2012

Usually I’m using 10-fold (non-stratified) CV to measure the predictive power of the models: it gives consistent results, and is easy to perform (at least on smaller datasets).

Just came across the Akaike’s Information Criterion (AIC) and Schwarz Bayesian Information Criterion (BIC). Citing robjhyndman,

Asymptotically, minimizing the AIC is equivalent to minimizing the CV value. This is true for any model (Stone 1977), not just linear models. It is this property that makes the AIC so useful in model selection when the purpose is prediction.

…

Because of the heavier penalty, the model chosen by BIC is either the same as that chosen by AIC, or one with fewer terms. Asymptotically, for linear models minimizing BIC is equivalent to leave–v–out cross-validation when v = n[1-1/(log(n)-1)] (Shao 1997).

Want to try AIC and maybe BIC on my models. Conveniently, both functions exist in R.