[R] Fwd: Using R for Multiple Regression

Greg Snow Greg.Snow at imail.org
Tue Aug 3 22:18:24 CEST 2010


There are an infinite number of solutions for your example, so I hope you really don't want to see all of them.  In theory you could work up some code to start showing them to you, but the sun will go nova and atomize you and your computer before it shows all of them.

Expressing the infinite number of solutions in a simple finite time and manner is a common homework problem in basic linear algebra classes.

You can see some of the most obvious other solutions by giving lm the predictor variables in different orders. Some others could be found using generalized inverses (but they are probably not that interesting).  Penalized models may also give some different solutions, but whether they have any meaning depends on your understanding and the question you are trying to answer.  There are other possibilities as well, but if you don't already have a good understanding of what is going on, they tend to be more dangerous than enlightening.


The primary difference in a 0 vs. NA for the coefficient estimate is in telling you what the computer did or did not do.

A 0 means that the computer estimated the slope and found the best estimate is 0, NA means that in order to give a meaningful result the slope was not even estimated.  For prediction you could use a 0 slope, but that is more work than just leaving the term out.  For inference and understanding there is a huge difference.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Ambikesh Jayal
> Sent: Sunday, August 01, 2010 7:25 AM
> To: r-help at r-project.org
> Subject: [R] Fwd: Using R for Multiple Regression
> 
> ---------- Forwarded message ----------
> From: Ambikesh Jayal <ambi1999 at gmail.com>
> Date: Sun, Aug 1, 2010 at 2:24 PM
> Subject: Re: [R] Using R for Multiple Regression
> To: ted.harding at manchester.ac.uk
> 
> 
> Hi Ted,
> 
> Thanks to all those who have replied. It was very helpful.
> 
> As there can be multiple solutions, is there a way in R to show all the
> possible models for a dataset?
> 
> Also in R the value of coefficient of an independent variable being
> shown as
> "NA" is same as being shown as "0" (implying that this variable does
> not
> count).
> 
> >However, in trying it out as you have, you have already found out
> something
> very important about linear regression! (And about R).
> 
> The important point being that there can be multiple equations
> describing a
> dataset? Or one way to simplify a model is to remove the independent
> variables that depend on other independent variables?
> 
> 
> Thanks again.
> 
> Kind regards
> 
> Ambikesh Jayal,
> Department of Information Systems, Computing and Mathematics
> Room 134 St John's Building
> Brunel University
> Uxbridge, Middlesex
> UB8 3PH, UK
> Website: http://sites.google.com/site/ambi1999/
> 
> 
> 
> 
> On Fri, Jul 30, 2010 at 5:59 PM, Ted Harding
> <Ted.Harding at manchester.ac.uk>wrote:
> 
> > On 30-Jul-10 15:07:46, Ambikesh Jayal wrote:
> > > Hi,
> > > Subject: Using R for Multiple Regression
> > >
> > > I am new to statistic but am interested in applying mathematical
> > > models to solve biological problems. I have used a linear model
> > > to generate the test data. When using this data I expect R to
> > > correctly identify the model but that does not seem to be the case.
> > > I am certain that I am doing something wrong but not able to figure
> > > it out.
> > >
> > > Model:
> > > Y = m1x1 + m2x2+ m3X3 + c
> > >
> >
> > >
> > > Model Identified by R using lm(formula = y ~ x1 + x2 + x3)
> > > (Intercept) 8.000e+01
> > > x1          1.100e+01
> > > x2                 NA
> > > x3                 NA
> > >
> > >
> > > The data I am using is as follows:
> > >
> > > y x1 x2 x3
> > > 91 1 14 2
> > > 102 2 15 5
> > > 113 3 16 8
> > > 124 4 17 11
> > > 135 5 18 14
> > > 146 6 19 17
> > > 157 7 20 20
> > > 168 8 21 23
> > > 179 9 22 26
> > > 190 10 23 29
> > >
> > > Kind regards
> > > Dr. Ambikesh Jayal,
> >
> > You should look again at your data!
> >
> > You have x2 = 13 + x1, x3 = 3*x1 - 1 in these data.
> > Hence your model
> >
> >  Y = m1*x1 + m2*x2+ m3*X3 + c
> >
> > with m1=5, m2=6, m3=0, c=2 is the same as
> >
> >  Y = 5*x1 + 6*(x1+13) + 0*(3*x1 - 1) + 2
> >    = 11*x1 + 6*13 + 2
> >    = 11*x1 + 80
> >
> > and R has found that the coefficient of x1 is 1.100e+01 = 11,
> > and that the intercept is 8.000e+01 = 80, and has also identified
> > that, after allowing for x1, x2 and x3 are irrelevant.
> >
> > So, to try out how R behaves in linear regression, you should
> > use data which do not have this property that some of the independent
> > variables (x1,x2,x3) are linear functions of the others.
> >
> > However, in trying it out as you have, you have already found out
> > something very important about linear regression! (And about R).
> >
> > Hoping this helps,
> > Ted.
> >
> > --------------------------------------------------------------------
> > E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> > Fax-to-email: +44 (0)870 094 0861
> > Date: 30-Jul-10                                       Time: 17:59:51
> > ------------------------------ XFMail ------------------------------
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list