[R] Factors in an regression using lm()

Ivan Calandra ivan.calandra at uni-hamburg.de
Tue Oct 12 11:56:36 CEST 2010


  Hi,

Your response (dependent) variable, which has to be on the left side of 
the '~' in the formula, should be numeric. In your example deny is a 
factor; first problem
The explaining variables, on the right side of the '~', should be 
factors. Here, hir, dir, css and mcs are numeric; second problem. Only 
black is a factor.

There are two possibilities (not mutually exclusive):
- you should transform your factors into numeric and vice-versa as 
needed, see ?factor and ?as.numeric, as well as StringAsFactor argument 
from ?read.table (I guess you imported your data.frame that way)
- you should adjust your model formula. It might be that you mixed up 
the variables in the formula. See ?formula

HTH,
Ivan

Le 10/12/2010 11:39, Gabriel Bergin a écrit :
> Hi,
>
> I am trying to do a multiple regression on the dataset "Hdma", available in
> the Ecdat package.
>
> The data looks like this:
>> str(Hdma)
> 'data.frame': 2381 obs. of  13 variables:
>   $ dir        : num  0.221 0.265 0.372 0.32 0.36 ...
>   $ hir        : num  0.221 0.265 0.248 0.25 0.35 ...
>   $ lvr        : num  0.8 0.922 0.92 0.86 0.6 ...
>   $ ccs        : num  5 2 1 1 1 1 1 2 2 2 ...
>   $ mcs        : num  2 2 2 2 1 1 2 2 2 1 ...
>   $ pbcr       : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
>   $ dmi        : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ...
>   $ self       : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
>   $ single     : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 2 1 1 2 ...
>   $ uria       : num  3.9 3.2 3.2 4.3 3.2 ...
>   $ comdominiom: num  0 0 0 0 0 0 1 0 0 0 ...
>   $ black      : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
>   $ deny       : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 2 1 ...
>
> I would like to try a more complex regression, but even this relatively
> uncomplicated one returns an error:
>
> summary(lm(deny ~ hir + dir + ccs + mcs + black))
>
> The error I get is:
> Error in storage.mode(y)<- "double" :
>    invalid to change the storage mode of a factor
> In addition: Warning message:
> In model.response(mf, "numeric") :
>    using type="numeric" with a factor response will be ignored
>
> I understand that there is something wrong due to the fact that some of the
> variables are factors. But as far as I've grasped, it should be possible to
> include factor variables when using lm(). Am I in error in thinking this?
>
> Sincerely,
> Gabriel Bergin
> Undergraduate economics student
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calandra at uni-hamburg.de

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php



More information about the R-help mailing list