[R] Dealing with data

TGS cran.questions at gmail.com
Fri Aug 13 21:59:56 CEST 2010


But in your comment, it sounded like you were in the realm of ANOVA when you made the degrees of freedom comment. I'm not going to get into the theory of statistics with you :) I'm just trying to learn R, take it easy. Yes, I understand that in the regression problem, the degrees of freedom for regression is 1, and in ANOVA, the degrees of freedom for sprays are 5. Thanks.

On Aug 13, 2010, at 12:54 PM, Greg Snow wrote:

If you do as.numeric on InsectSprays and use the result as a predictor in lm, then it will only fit 1 degree of freedom, not 5, try it and see.  That is why I was asking and giving an alternative that would still use 5 degrees of freedom.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: TGS [mailto:cran.questions at gmail.com]
> Sent: Friday, August 13, 2010 1:52 PM
> To: Greg Snow
> Subject: Re: [R] Dealing with data
> 
> P.S. The degrees of freedom for sprays would be 5 and not 1.
> 
> On Aug 13, 2010, at 12:27 PM, Greg Snow wrote:
> 
> So you want 1 degree of freedom for InsectSprays?  You believe that the
> difference between A and B is exactly the same as between B and C which
> is exactly the same as between D and E (etc.)?  that seems an odd
> assumption, but you can get that by using as.numeric (as I and others
> have already stated).
> 
> If on the other hand you want InsectSprays to be treated correctly with
> the correct number of degrees of freedom, but have the output on a
> single line testing the overall effect, then you want to use the aov
> function rather than lm (internally they do the same thing, but the
> default summary output for aov is 1 line per term).
> 
> Hope this helps,
> 
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
> 
> 
>> -----Original Message-----
>> From: TGS [mailto:cran.questions at gmail.com]
>> Sent: Friday, August 13, 2010 11:51 AM
>> To: Greg Snow
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Dealing with data
>> 
>> # Greg, if R automatically does that then I don't know why it's
>> treating each indicator
>> # as a different regressor. In other words, I am interested in
> treating
>> 'spray' as one
>> # independent variable.
>> #
>> # Erik, which book do you suggest I read? Thanks.
>> 
>> data(InsectSprays)
>> lm(InsectSprays$count ~ 0 + InsectSprays$spray)
>> 
>> On Aug 13, 2010, at 10:34 AM, Greg Snow wrote:
>> 
>> R/S does all of that automatically for you, you do not need to
> manually
>> create the indicator variables.
>> 
>> If you do something like:
>> 
>>> fit <- lm( Sepal.Width ~ Species, data=iris, x=TRUE)
>> 
>> Then look at the matrix actually used:
>> 
>>> fit$x
>> 
>> Or the output:
>> 
>>> summary(fit)
>> 
>> You will see that Species was automatically converted into indicator
>> variables and those were used in the regression.
>> 
>> If you really need the indicator variables yourself, look at the
>> model.matrix function, e.g.:
>> 
>>> model.matrix( ~Species, data=iris )
>> 
>> Or
>> 
>>> model.matrix( ~Species - 1, data=iris )
>> 
>> If you really want 1 for A, 2 for B, etc. then look at as.numeric on
> a
>> factor variable (e.g. as.numeric(iris$Species) ).
>> 
>> Hope this helps,
>> 
>> --
>> Gregory (Greg) L. Snow Ph.D.
>> Statistical Data Center
>> Intermountain Healthcare
>> greg.snow at imail.org
>> 801.408.8111
>> 
>> 
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>> project.org] On Behalf Of TGS
>>> Sent: Friday, August 13, 2010 11:22 AM
>>> To: David Winsemius
>>> Cc: r-help at r-project.org
>>> Subject: Re: [R] Dealing with data
>>> 
>>> To clarify, I'd like to create a column of indicators for the
>>> respective letters so that I could maybe do regression on
> indicators,
>>> etc.
>>> 
>>> For instance, "A" gets "1", "B" gets "2", and so on.
>>> 
>>> On Aug 13, 2010, at 10:19 AM, David Winsemius wrote:
>>> 
>>> 
>>> On Aug 13, 2010, at 1:03 PM, TGS wrote:
>>> 
>>>> # how would I code in R to look at the letter of the alphabet
>>>> # in the second column and create a indicator column for the
>>>> # corresponding letter?
>>>> 
>>>> data(InsectSprays)
>>>> InsectSprays$spray
>>> 
>>> It's already what most people mean when they say "indicator column",
>>> i.e., a factor variable (and not a character vector) .... so,  what
>> do
>>> _you_ mean?
>>>> 
>>> 
>>> 
>>> --
>>> 
>>> David Winsemius, MD
>>> West Hartford, CT
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list