[R] Adding a subset to a glm messes up factors?

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Dec 7 15:03:28 CET 2007


First, 'subset' is an argument to glm(), but for some reason you did not 
use it.  Your subject line is quite misleading, and had it been the more 
accurate

 	Adding a 'data' argument to glm messes up factors?

you might have realised the problem.

Second, your models are fitted to different datasets: the first to objects 
in your workspace, and the second to columns of data.all. Since you have 
not (as we asked) given a reproducible example we cannot know what those 
differences are, but differences in the datasets will be the key.

Third, the best way to fit linear models is lm(), not 
glm(family=gaussian).


On Fri, 7 Dec 2007, Muri Soares wrote:

> I have a problem with running a glm using a subset of my data. Whenever 
> I choose a subset, in the summary the factors arent shown (as if the 
> variable was a continuous variable). If I dont use subsets then all the 
> factors are shown. I have copied the output from summary for both cases.
>
> Thanks for the help,
> Muri
>
>> model<-glm(log(cpue)~year,family=gaussian)
> Call:
> glm(formula = log(cpue) ~ year, family = gaussian)
>
> Deviance Residuals:
>    Min       1Q   Median       3Q      Max
> -2.0962  -0.5851  -0.1241   0.4805   3.9236
>
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept)   0.8899     0.1844   4.825 1.42e-06 ***
> year1990     -0.6107     0.1925  -3.173  0.00152 **
> year1991     -1.7466     0.1902  -9.184  < 2e-16 ***
> year1992     -1.4061     0.1864  -7.544 5.07e-14 ***
> year1993     -1.4069     0.1860  -7.565 4.31e-14 ***
> ...
>
>> model<-glm(log(cpue)~year,family=gaussian,subset(data.all,species=="n")
> Call:
> glm(formula = log(cpue) ~ year, family = gaussian, data = subset(data.all,
>    species == "n"))
>
> Deviance Residuals:
>     Min        1Q    Median        3Q       Max
> -1.64577  -0.61671  -0.08972   0.55792   2.73737
>
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept) 32.446570  10.076895   3.220  0.00135 **
> year        -0.016345   0.005037  -3.245  0.00123 **
> ---

> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


PLEASE do!

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list