[R] Regression with factor having1 level

Nordlund, Dan (DSHS/RDA) NordlDJ at dshs.wa.gov
Fri Mar 11 02:45:15 CET 2016


> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of David
> Winsemius
> Sent: Thursday, March 10, 2016 4:39 PM
> To: Robert McGehee
> Cc: r-help at r-project.org
> Subject: Re: [R] Regression with factor having1 level
> 
> 
> > On Mar 10, 2016, at 2:00 PM, Robert McGehee <rmcgehee at gmail.com>
> wrote:
> >
> > Hello R-helpers,
> > I'd like a function that given an arbitrary formula and a data frame
> > returns the residual of the dependent variable,and maintains all NA values.
> 
> What does "maintains all NA values" actually mean?
> >
> > Here's an example that will give me what I want if my formula is
> > y~x1+x2+x3 and my data frame is df:
> >
> > resid(lm(y~x1+x2+x3, data=df, na.action=na.exclude))
> >
> > Here's the catch, I do not want my function to ever fail due to a
> > factor with only one level. A one-level factor may appear because 1)
> > the user passed it in, or 2) (more common) only one factor in a term
> > is left after na.exclude removes the other NA values.
> >
> > Here is the error I would get
> 
> From what code?
> 
> 
> > above if one of the terms was a factor with one level:
> > Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
> >  contrasts can be applied only to factors with 2 or more levels
> 
> Unable to create that error with the actions you decribe but to not actually
> offer in coded form:
> 
> 
> > dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=TRUE, x3=rnorm(10))
> > lm(y~x1+x2+x3, dfrm)
> 
> Call:
> lm(formula = y ~ x1 + x2 + x3, data = dfrm)
> 
> Coefficients:
> (Intercept)           x1       x2TRUE           x3
>    -0.16274     -0.30032           NA     -0.09093
> 
> > resid(lm(y~x1+x2+x3, data=dfrm, na.action=na.exclude))
>           1           2           3           4           5           6
> -0.16097245  0.65408508 -0.70098223 -0.15360434  1.26027872  0.55752239
>           7           8           9          10
> -0.05965653 -2.17480605  1.42917190 -0.65103650
> 
> >
> 
> 
> > Instead of giving me an error, I'd like the function to do just what
> > lm() normally does when it sees a variable with no variance, ignore
> > the variable (coefficient is NA) and continue to regress out all the other
> variables.
> > Thus if 'x2' is a factor with one variable in the above example, I'd
> > like the function to return the result of:
> > resid(lm(y~x1+x3, data=df, na.action=na.exclude)) Can anyone provide
> > me a straight forward recommendation for how to do this?
> > I feel like it should be easy, but I'm honestly stuck, and my Google
> > searching for this hasn't gotten anywhere. The key is that I'd like
> > the solution to be generic enough to work with an arbitrary linear
> > formula, and not substantially kludgy (like trying ever combination of
> > regressions terms until one works) as I'll be running this a lot on
> > big data sets and don't want my computation time swamped by running
> > unnecessary regressions or checking for number of factors after removing
> NAs.
> >
> > Thanks in advance!
> > --Robert
> >
> >
> > PS. The Google search feature in the R-help archives appears to be down:
> > http://tolstoy.newcastle.edu.au/R/
> 
> It's working for me.
> 
> >
> > 	[[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 

I agree that what is wanted is not clear.  However, if dfrm is created with x2 as a factor, then you get the error message that the OP mentions when you run the regression.

> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=as.factor(TRUE), x3=rnorm(10))
> lm(y~x1+x2+x3, dfrm, na.action=na.exclude)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied


Dan

Daniel Nordlund, PhD
Research and Data Analysis Division
Services & Enterprise Support Administration
Washington State Department of Social and Health Services



More information about the R-help mailing list