[R] Regression with factor having1 level

Robert McGehee rmcgehee at gmail.com
Fri Mar 11 16:03:01 CET 2016

In case this is helpful for anyone, I think I've coded a satisfactory
function answering my problem (of handling formulas containing 1-level
factors) by hacking liberally at the model.matrix code to remove any
model terms for which the contrast fails. As it's a problem I've come
across a lot (since my data frames have factors and lots of missing
values), adding support for 1-level factors might be a nice item for
the R Wishlist. I suppose a key question is, does anyone ever _want_
to see the error "contrasts can be applied only to factors with 2 or
more levels", or should the contrasts function just add a column of
all zeros (or ones) to the design matrix and let the modelling
functions handle that the same way it does any other zero-variance

Anyway, my function below:

lmresid <- function(formula, data) {
    mf <- model.frame(formula, data=data, na.action=na.exclude)
    omit <- attr(mf, "na.action")
    t <- terms(mf)
    contr.funs <- as.character(getOption("contrasts"))
    namD <- names(mf)
    for (i in namD) if (is.character(mf[[i]]))
        mf[[i]] <- factor(mf[[i]])
    isF <- vapply(mf, function(x) is.factor(x) || is.logical(x), NA)
    isF[1] <- FALSE
    isOF <- vapply(mf, is.ordered, NA)
    for (nn in namD[isF])
        if (is.null(attr(mf[[nn]], "contrasts"))) {
            noCntr <- try(contrasts(mf[[nn]]) <- contr.funs[1 +
isOF[nn]], silent=TRUE)
            if (inherits(noCntr, "try-error")) {       # Remove term
from model on error
                mf[[nn]] <- NULL
                t <- terms(update(t, as.formula(paste("~ . -", nn))), data=mf)
    ans <- .External2(stats:::C_modelmatrix, t, mf)
    r   <- .lm.fit(ans, mf[[1]])$residual
    stats:::naresid.exclude(omit, r)

## Note that lmresid now returns the same values as resid with the
## 1-level factor removed.
df <- data.frame(y=c(0,2,4,6,8), x1=c(1,1,2,2,NA),
lmresid(y~x1+x2, data=df)
resid(lm(y~x1, data=df, na.action=na.exclude))


PS, Peter, wasn't sure if you also meant to add comments, but they
didn't come through.

On Fri, Mar 11, 2016 at 3:40 AM, peter dalgaard <pdalgd at gmail.com> wrote:
>> On 11 Mar 2016, at 02:03 , Robert McGehee <rmcgehee at gmail.com> wrote:
>>> df <- data.frame(y=c(0,2,4,6,8), x1=c(1,1,2,2,NA),
>> x2=factor(c("A","A","A","A","B")))
>>> resid(lm(y~x1+x2, data=df, na.action=na.exclude)
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

More information about the R-help mailing list