[R] Regression with factor having1 level
David Winsemius
dwinsemius at comcast.net
Fri Mar 11 01:39:24 CET 2016
> On Mar 10, 2016, at 2:00 PM, Robert McGehee <rmcgehee at gmail.com> wrote:
>
> Hello R-helpers,
> I'd like a function that given an arbitrary formula and a data frame
> returns the residual of the dependent variable,and maintains all NA values.
What does "maintains all NA values" actually mean?
>
> Here's an example that will give me what I want if my formula is y~x1+x2+x3
> and my data frame is df:
>
> resid(lm(y~x1+x2+x3, data=df, na.action=na.exclude))
>
> Here's the catch, I do not want my function to ever fail due to a factor
> with only one level. A one-level factor may appear because 1) the user
> passed it in, or 2) (more common) only one factor in a term is left after
> na.exclude removes the other NA values.
>
> Here is the error I would get
From what code?
> above if one of the terms was a factor with
> one level:
> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
> contrasts can be applied only to factors with 2 or more levels
Unable to create that error with the actions you decribe but to not actually offer in coded form:
> dfrm <- data.frame(y=rnorm(10), x1=rnorm(10) ,x2=TRUE, x3=rnorm(10))
> lm(y~x1+x2+x3, dfrm)
Call:
lm(formula = y ~ x1 + x2 + x3, data = dfrm)
Coefficients:
(Intercept) x1 x2TRUE x3
-0.16274 -0.30032 NA -0.09093
> resid(lm(y~x1+x2+x3, data=dfrm, na.action=na.exclude))
1 2 3 4 5 6
-0.16097245 0.65408508 -0.70098223 -0.15360434 1.26027872 0.55752239
7 8 9 10
-0.05965653 -2.17480605 1.42917190 -0.65103650
>
> Instead of giving me an error, I'd like the function to do just what lm()
> normally does when it sees a variable with no variance, ignore the variable
> (coefficient is NA) and continue to regress out all the other variables.
> Thus if 'x2' is a factor with one variable in the above example, I'd like
> the function to return the result of:
> resid(lm(y~x1+x3, data=df, na.action=na.exclude))
> Can anyone provide me a straight forward recommendation for how to do this?
> I feel like it should be easy, but I'm honestly stuck, and my Google
> searching for this hasn't gotten anywhere. The key is that I'd like the
> solution to be generic enough to work with an arbitrary linear formula, and
> not substantially kludgy (like trying ever combination of regressions terms
> until one works) as I'll be running this a lot on big data sets and don't
> want my computation time swamped by running unnecessary regressions or
> checking for number of factors after removing NAs.
>
> Thanks in advance!
> --Robert
>
>
> PS. The Google search feature in the R-help archives appears to be down:
> http://tolstoy.newcastle.edu.au/R/
It's working for me.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list