[R] Regression with factor having1 level

Robert McGehee rmcgehee at gmail.com
Thu Mar 10 23:00:00 CET 2016

Hello R-helpers,
I'd like a function that given an arbitrary formula and a data frame
returns the residual of the dependent variable, and maintains all NA values.

Here's an example that will give me what I want if my formula is y~x1+x2+x3
and my data frame is df:

resid(lm(y~x1+x2+x3, data=df, na.action=na.exclude))

Here's the catch, I do not want my function to ever fail due to a factor
with only one level. A one-level factor may appear because 1) the user
passed it in, or 2) (more common) only one factor in a term is left after
na.exclude removes the other NA values.

Here is the error I would get above if one of the terms was a factor with
one level:
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
  contrasts can be applied only to factors with 2 or more levels

Instead of giving me an error, I'd like the function to do just what lm()
normally does when it sees a variable with no variance, ignore the variable
(coefficient is NA) and continue to regress out all the other variables.
Thus if 'x2' is a factor with one variable in the above example, I'd like
the function to return the result of:
resid(lm(y~x1+x3, data=df, na.action=na.exclude))

Can anyone provide me a straight forward recommendation for how to do this?
I feel like it should be easy, but I'm honestly stuck, and my Google
searching for this hasn't gotten anywhere. The key is that I'd like the
solution to be generic enough to work with an arbitrary linear formula, and
not substantially kludgy (like trying ever combination of regressions terms
until one works) as I'll be running this a lot on big data sets and don't
want my computation time swamped by running unnecessary regressions or
checking for number of factors after removing NAs.

Thanks in advance!

PS. The Google search feature in the R-help archives appears to be down:

	[[alternative HTML version deleted]]

More information about the R-help mailing list