[R] What BIC is calculated by 'regsubsets'?

Paul Murtaugh murtaugh at stat.oregonstate.edu
Fri Dec 19 20:39:17 CET 2008


The function 'regsubsets' appears to calculate a BIC value that is 
different from that returned by the function 'BIC'.  The latter is 
explained in the documentation, but I can't find an expression for the 
statistic returned by 'regsubsets'.

Incidentally, both of these differ from the BIC that is given in Ramsey 
and Schafer's, The Statistical Sleuth.  I assume these are all linear 
transformations of each other, but I'd like to know the 'regsubsets' 
formula (so that I can develop a way to do all-subsets selection based 
on the AIC rather than the BIC).

The following code defines a function that illustrates the issue.
Thanks
-Paul


script.ic <- function() {

library(datasets)
print(names(airquality))              # Ozone Solar.R Wind Temp Month Day

# Fit a model with two predictors
 mod1 <- lm(Ozone ~ Wind + Temp, data=airquality)
 npar <- length(mod1$coef)+1         # no. parameters in fitted model,
                                     #   including s2, is 4
 nobs <- length(mod1$fitted)         # no. of observations = 116
 s2 <- summary(mod1)$sigma2         # MSE = 477.6371
 logL <- as.vector(logLik(mod1))     # log likelihood = -520.8705

# Use the R function BIC, defined as: -2*log-likelihood + npar*log(nobs)
 tmp1 <- BIC(mod1)                           # 1060.755
 tmp2 <- -2 * (-520.8705) + 4 * log(116)     # 1060.755, agrees
 cat(paste("\nFrom R's BIC:",signif(tmp1,5),"(",signif(tmp2,5),
    "obtained 'by hand')\n\n"))

# Now see how 'regsubsets' calculates the BIC
 tmp3 <- regsubsets(Ozone ~ Solar.R + Wind + Temp, data=airquality)
 tmp3.s <- summary(tmp3)

# 'mod1' is the second model in 'tmp3'; what is the formula for this BIC?
 cat("\nThe corresponding model from 'regsubsets':\n")
 print(tmp3.s$which[2,])
 tmp4 <- tmp3.s$bic[2]                       # -82.52875
 cat(paste("\nBIC =",signif(tmp4,5),"\n"))

# Incidentally, the 'rsq' and 'rss' components of tmp3.s do not agree
# with the values in the 'mod1' lm output object.

# Just for kicks, try the formula for Schwarz's BIC from Ramsey & Schafer,
# Statistical Sleuth:  nobs*log(MSE) + npar*log(nobs))
 tmp5 <- 116 * log(477.6371) + 4*log(116)    # 734.6011
 cat(paste("\nStat. Sleuth's BIC =",signif(tmp5,5),"\n"))
 cat("\nUff da!\n")

browser()

}



More information about the R-help mailing list