[R] GLM with Numeric and Factor as an Input

Rolf Turner r.turner at auckland.ac.nz
Tue Feb 25 21:04:31 CET 2014


On 26/02/14 01:40, Lorenzo Isella wrote:
> Dear All,
> Please consider the snippet at the end of the email.
> It is representative of the problems I am experiencing.
> I am trying to use glm (without using the formula interface because the
> original data is quite large) to model the response in a case where the
> predictors are a mix of numbers and factors.
> In the end, I always end up with an error message, despite having tried
> different choices for the "family" parameter.
> Maybe I am missing the obvious, but can anyone run glm with a
> combination of numbers and factors?
> Any help is appreciated.
> Cheers
>
> Lorenzo
>
>
>
>
> ###############################################################
> set.seed(1234)
>
> x <- rnorm(1000)
> dim(x) <- c(100,10)
> x <- as.data.frame(x)
> names(x) <- LETTERS[seq(10)]
>
> x$J <- round(x$J)
>
> x$J <- as.factor(x$J)
>
> y <- x$A
> x <- subset(x, select=-c(A))
>
> model <- glm.fit(x,y## , family=gaussian)

 From the help for glm.fit:

>> For glm.fit: x is a ***design*** matrix of dimension n * p, and y is
>> a vector of observations of length n.

(Emphasis mine.)

So if you want to/insist on using glm.fit() rather than glm() you will 
have construct your own design matrix.  I.e. replace
each factor column by k-1 columns of dummy variables (where k is the 
number of levels of the given factor).  Note that "x" should really be a 
*matrix*, not a data frame although it seems that data frames (all of 
whose columns are numeric) get coerced to matrices so it doesn't matter 
much.

cheers,

Rolf Turner




More information about the R-help mailing list