[R] lm function in R

Sun Feb 14 00:20:02 CET 2010

On Feb 13, 2010, at 5:03 PM, Something Something wrote:

> I tried..
>
> mod = lm(Y ~ X1*X2*X3, na.action = na.exclude)
> formula(mod)
>
> This produced....
> Y ~ X1 * X2 * X3
>
>
> When I typed just mod I got:
>
> Call:
> lm(formula = Y ~ X1 * X2 * X3, na.action = na.exclude)
>
> Coefficients:
> (Intercept)          X11          X21          X31      X11:X21       
> X11:X31
>     X21:X31  X11:X21:X31
>   177.9245       0.2005       2.4482       3.1216       0.8127      
> -26.6166
>     -3.0398      29.6049
>
>
> I am trying to figure out how R computed all these coefficients.

 From your question it is difficult to determine what sort of tutoring  
you are expecting. To get the code of an R formula, you just type its  
name:

lm

Leads to lm.fit:

lm.fit

Reading further it appears the lm and lm.fit functions are really  
front ends for this call:

.Fortran("dqrls", qr = x, n = n, p = p, y = y, ny = ny,
         tol = as.double(tol), coefficients = mat.or.vec(p, ny),
         residuals = y, effects = y, rank = integer(1L), pivot = 1L:p,
         qraux = double(p), work = double(2 * p), PACKAGE = "base")

Seems pretty likely that is a QR decomposition-based method that i  
implemented in compiled code.

So if you want to go deeper, at least you know what to search for. Or  
if you want to know how regression works on a matrix level, you should  
consult a good reference text or Wikipedia, which is surprisingly good  
for that sort of question these days.

-- 
David.
>
>
>
>
>
> On Sat, Feb 13, 2010 at 1:30 PM, Bert Gunter  
> <gunter.berton at gene.com> wrote:
>
>> ?formula
>>
>>
>> Bert Gunter
>> Genentech Nonclinical Statistics
>>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org 
>> ]
>> On
>> Behalf Of Something Something
>> Sent: Saturday, February 13, 2010 1:24 PM
>> To: Daniel Nordlund
>> Cc: r-help at r-project.org
>> Subject: Re: [R] lm function in R
>>
>> Thanks Dan.  Yes that was very helpful.  I didn't see the change  
>> from '*'
>> to
>> '+'.
>>
>> Seems like when I put * it means - interaction & when I put + it's  
>> not an
>> interaction.
>>
>> Is it correct to assume then that...
>>
>> When I put + R evaluates the following equation:
>> Y-Hat = b0 + b1X1 + b2X2 + . . . bkXk + 7 7 7 + bkXk
>>
>>
>> But when I put * R evaluates the following equation;
>> Y-Hat = b0 + b1X1 + b2x2 + ... + bkXk + b12 X12+ b13 X13 +........   
>> + c
>>
>> Is this correct?  If it is then can someone point me to any sources  
>> that
>> will explain how the coefficients (such as b0... bk, b12.. ,  
>> b123..) are
>> calculated.  I guess, one source is the R source code :) but is  
>> there any
>> other documentation anywhere?
>>
>> Please let me know.  Thanks.
>>
>>
>>
>> On Fri, Feb 12, 2010 at 5:54 PM, Daniel Nordlund
>> <djnordlund at verizon.net>wrote:
>>
>>>> -----Original Message-----
>>>> From: r-help-bounces at r-project.org [mailto:
>> r-help-bounces at r-project.org]
>>>> On Behalf Of Something Something
>>>> Sent: Friday, February 12, 2010 5:28 PM
>>>> To: Phil Spector; r-help at r-project.org
>>>> Subject: Re: [R] lm function in R
>>>>
>>>> Thanks for the replies everyone.  Greatly appreciate it.  Some
>> progress,
>>>> but
>>>> now I am getting the following values when I don't use "as.factor"
>>>>
>>>> 13.14167 25.11667 28.34167 49.14167 40.39167 66.86667
>>>>
>>>> Is that what you guys get?
>>>
>>>
>>> If you look at Phil's response below, no, that is not what he  
>>> got.  The
>>> difference is that you are specifying an interaction, whereas Phil  
>>> did
>> not
>>> (because the equation you initially specified did not include an
>>> interaction.  Use Y ~ X1 + X2 instead of Y ~ X1*X2 for your formula.
>>>
>>>>
>>>>
>>>> On Fri, Feb 12, 2010 at 5:00 PM, Phil Spector
>>>> <spector at stat.berkeley.edu>wrote:
>>>>
>>>>> By converting the two variables to factors, you are fitting
>>>>> an entirely different model.  Leave out the as.factor stuff
>>>>> and it will work exactly as you want it to.
>>>>>
>>>>> dat
>>>>>>
>>>>> V1 V2 V3 V4
>>>>> 1 s1 14  4  1
>>>>> 2 s2 23  4  2
>>>>> 3 s3 30  7  2
>>>>> 4 s4 50  7  4
>>>>> 5 s5 39 10  3
>>>>> 6 s6 67 10  6
>>>>>
>>>>>> names(dat) = c('id','y','x1','x2')
>>>>>> z = lm(y~x1+x2,dat)
>>>>>> predict(z)
>>>>>>
>>>>>      1        2        3        4        5        6 15.16667
>> 24.66667
>>>>> 27.66667 46.66667 40.16667 68.66667
>>>>>
>>>>>
>>>>>                                       - Phil Spector
>>>>>                                        Statistical Computing
>> Facility
>>>>>                                        Department of Statistics
>>>>>                                        UC Berkeley
>>>>>                                        spector at stat.berkeley.edu
>>>>>
>>>>>
>>>>>
>>>>> On Fri, 12 Feb 2010, Something Something wrote:
>>>>>
>>>>> Hello,
>>>>>>
>>>>>> I am trying to learn how to perform Multiple Regression  
>>>>>> Analysis in
>> R.
>>>> I
>>>>>> decided to take a simple example given in this PDF:
>>>>>> http://www.utdallas.edu/~herve/abdi-prc-pretty.pdf
>>>>>>
>>>>>> I created a small CSV called, students.csv that contains the
>> following
>>>>>> data:
>>>>>>
>>>>>> s1 14 4 1
>>>>>> s2 23 4 2
>>>>>> s3 30 7 2
>>>>>> s4 50 7 4
>>>>>> s5 39 10 3
>>>>>> s6 67 10 6
>>>>>>
>>>>>> Col headers:  Student id, Memory span(Y), age(X1), speech  
>>>>>> rate(X2)
>>>>>>
>>>>>> Now the expected results are:
>>>>>>
>>>>>> yHat[0]:15.166666666666668
>>>>>> yHat[1]:24.666666666666668
>>>>>> yHat[2]:27.666666666666664
>>>>>> yHat[3]:46.666666666666664
>>>>>> yHat[4]:40.166666666666664
>>>>>> yHat[5]:68.66666666666667
>>>>>>
>>>>>> This is based on the following equation (given in the PDF):  Y =
>> 1.67
>>> +
>>>> X1
>>>>>> +
>>>>>> 9.50 X2
>>>>>>
>>>>>> I ran the following commands in R:
>>>>>>
>>>>>> data = read.table("students.csv", head=F, as.is=T, na.string=".",
>>>>>> row.nam=NULL)
>>>>>> X1 = as.factor(data[[3]])
>>>>>> X2 = as.factor(data[[4]])
>>>>>> Y = data[[2]]
>>>>>> mod = lm(Y ~ X1*X2, na.action = na.exclude)
>>>>>> Y.hat = fitted(mod)
>>>>>> Y.hat
>>>>>>
>>>>>> This gives me the following output:
>>>>>>
>>>>>> Y.hat
>>>>>>>
>>>>>> 1  2  3  4  5  6
>>>>>> 14 23 30 50 39 67
>>>>>>
>>>>>> Obviously I am doing something wrong.  Please help.  Thanks.
>>>>>>
>>>
>>> Hope this is helpful,
>>>
>>> Dan
>>>
>>> Daniel Nordlund
>>> Bothell, WA USA
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>        [[alternative HTML version deleted]]
>>
>>
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT