[R] lm function in R

Daniel Malter daniel at umd.edu
Sun Feb 14 02:09:41 CET 2010


It seems to me that your question is more about the econometrics than about
R. Any introductory econometric textbook or compendium on econometrics will
cover this as it is a basic. See, for example, Greene 2006 or Wooldridge
2002.

Say X is your data matrix, that contains columns for each of the individual
variables (x), columns with their interactions, and one column of 1s for the
intercept. Let y be your dependent variable. Then, OLS estimates are
computed by

X'X inverse X'y

Or in R

solve(t(X)%*%X)%*%t(X)%*%y


Best,
Daniel 


-------------------------
cuncta stricte discussurus
-------------------------
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Something Something
Sent: Saturday, February 13, 2010 5:04 PM
To: Bert Gunter
Cc: r-help at r-project.org
Subject: Re: [R] lm function in R

I tried..

mod = lm(Y ~ X1*X2*X3, na.action = na.exclude)
formula(mod)

This produced....
Y ~ X1 * X2 * X3


When I typed just mod I got:

Call:
lm(formula = Y ~ X1 * X2 * X3, na.action = na.exclude)

Coefficients:
(Intercept)          X11          X21          X31      X11:X21      X11:X31
     X21:X31  X11:X21:X31
   177.9245       0.2005       2.4482       3.1216       0.8127     -26.6166
     -3.0398      29.6049


I am trying to figure out how R computed all these coefficients.





On Sat, Feb 13, 2010 at 1:30 PM, Bert Gunter <gunter.berton at gene.com> wrote:

> ?formula
>
>
> Bert Gunter
> Genentech Nonclinical Statistics
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On
> Behalf Of Something Something
> Sent: Saturday, February 13, 2010 1:24 PM
> To: Daniel Nordlund
> Cc: r-help at r-project.org
> Subject: Re: [R] lm function in R
>
> Thanks Dan.  Yes that was very helpful.  I didn't see the change from '*'
> to
> '+'.
>
> Seems like when I put * it means - interaction & when I put + it's not an
> interaction.
>
> Is it correct to assume then that...
>
> When I put + R evaluates the following equation:
> Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + 7 7 7 + bkXk
>
>
> But when I put * R evaluates the following equation;
> Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +........  + c
>
> Is this correct?  If it is then can someone point me to any sources that
> will explain how the coefficients (such as b0... bk, b12.. , b123..) are
> calculated.  I guess, one source is the R source code :) but is there any
> other documentation anywhere?
>
> Please let me know.  Thanks.
>
>
>
> On Fri, Feb 12, 2010 at 5:54 PM, Daniel Nordlund
> <djnordlund at verizon.net>wrote:
>
> > > -----Original Message-----
> > > From: r-help-bounces at r-project.org [mailto:
> r-help-bounces at r-project.org]
> > > On Behalf Of Something Something
> > > Sent: Friday, February 12, 2010 5:28 PM
> > > To: Phil Spector; r-help at r-project.org
> > > Subject: Re: [R] lm function in R
> > >
> > > Thanks for the replies everyone.  Greatly appreciate it.  Some
> progress,
> > > but
> > > now I am getting the following values when I don't use "as.factor"
> > >
> > > 13.14167 25.11667 28.34167 49.14167 40.39167 66.86667
> > >
> > > Is that what you guys get?
> >
> >
> > If you look at Phil's response below, no, that is not what he got.  The
> > difference is that you are specifying an interaction, whereas Phil did
> not
> > (because the equation you initially specified did not include an
> > interaction.  Use Y ~ X1 + X2 instead of Y ~ X1*X2 for your formula.
> >
> > >
> > >
> > > On Fri, Feb 12, 2010 at 5:00 PM, Phil Spector
> > > <spector at stat.berkeley.edu>wrote:
> > >
> > > > By converting the two variables to factors, you are fitting
> > > > an entirely different model.  Leave out the as.factor stuff
> > > > and it will work exactly as you want it to.
> > > >
> > > >  dat
> > > >>
> > > >  V1 V2 V3 V4
> > > > 1 s1 14  4  1
> > > > 2 s2 23  4  2
> > > > 3 s3 30  7  2
> > > > 4 s4 50  7  4
> > > > 5 s5 39 10  3
> > > > 6 s6 67 10  6
> > > >
> > > >> names(dat) = c('id','y','x1','x2')
> > > >> z = lm(y~x1+x2,dat)
> > > >> predict(z)
> > > >>
> > > >       1        2        3        4        5        6 15.16667
> 24.66667
> > > > 27.66667 46.66667 40.16667 68.66667
> > > >
> > > >
> > > >                                        - Phil Spector
> > > >                                         Statistical Computing
> Facility
> > > >                                         Department of Statistics
> > > >                                         UC Berkeley
> > > >                                         spector at stat.berkeley.edu
> > > >
> > > >
> > > >
> > > > On Fri, 12 Feb 2010, Something Something wrote:
> > > >
> > > >  Hello,
> > > >>
> > > >> I am trying to learn how to perform Multiple Regression Analysis in
> R.
> > > I
> > > >> decided to take a simple example given in this PDF:
> > > >> http://www.utdallas.edu/~herve/abdi-prc-pretty.pdf
> > > >>
> > > >> I created a small CSV called, students.csv that contains the
> following
> > > >> data:
> > > >>
> > > >> s1 14 4 1
> > > >> s2 23 4 2
> > > >> s3 30 7 2
> > > >> s4 50 7 4
> > > >> s5 39 10 3
> > > >> s6 67 10 6
> > > >>
> > > >> Col headers:  Student id, Memory span(Y), age(X1), speech rate(X2)
> > > >>
> > > >> Now the expected results are:
> > > >>
> > > >> yHat[0]:15.166666666666668
> > > >> yHat[1]:24.666666666666668
> > > >> yHat[2]:27.666666666666664
> > > >> yHat[3]:46.666666666666664
> > > >> yHat[4]:40.166666666666664
> > > >> yHat[5]:68.66666666666667
> > > >>
> > > >> This is based on the following equation (given in the PDF):  Y =
> 1.67
> > +
> > > X1
> > > >> +
> > > >> 9.50 X2
> > > >>
> > > >> I ran the following commands in R:
> > > >>
> > > >> data = read.table("students.csv", head=F, as.is=T, na.string=".",
> > > >> row.nam=NULL)
> > > >> X1 = as.factor(data[[3]])
> > > >> X2 = as.factor(data[[4]])
> > > >> Y = data[[2]]
> > > >> mod = lm(Y ~ X1*X2, na.action = na.exclude)
> > > >> Y.hat = fitted(mod)
> > > >> Y.hat
> > > >>
> > > >> This gives me the following output:
> > > >>
> > > >>  Y.hat
> > > >>>
> > > >> 1  2  3  4  5  6
> > > >> 14 23 30 50 39 67
> > > >>
> > > >> Obviously I am doing something wrong.  Please help.  Thanks.
> > > >>
> >
> > Hope this is helpful,
> >
> > Dan
> >
> > Daniel Nordlund
> > Bothell, WA USA
> >
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
>
>

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list