[R] Fitting linear models

Vemuri, Aparna avemuri at epri.com
Tue Apr 21 18:37:40 CEST 2009


These are all field measured values. 
For a little background here, I have field measurements of SO4, NO3 and NH4. I used these variables in an atmospheric chemistry model to calculate PBW on a line-by-line basis. 

To bypass the use of the complex atmospheric chemistry model in the future, I want to develop a regression equation based on the current results I have.  Also, I know the atmospheric chemistry model requires SO4, NO3 and NH4 to estimate PBW. So I am using the same as IVs for the regression model. 
Aparna 

-----Original Message-----
From: Dimitri Liakhovitski [mailto:ld7631 at gmail.com] 
Sent: Tuesday, April 21, 2009 9:31 AM
To: Vemuri, Aparna
Subject: Re: [R] Fitting linear models

Aparna, why are your IVs so highly intercorrelated? It's not a good sign...

On Tue, Apr 21, 2009 at 12:29 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote:
> But if the multicollinearity is so strong, then I am wondering why it
> worked in the data frame as opposed to 4 seprate vectors? It should
> not make any difference...
> Dimitri
>
> On Tue, Apr 21, 2009 at 12:21 PM, Vemuri, Aparna <avemuri at epri.com> wrote:
>> Thanks Dimitri! Following exactly what you did, I wrote all my individual variable vectors to a data frame and used lm(formula,data) and this time it works for me too.
>>
>> Marc, your theory is correct.NH4 variable shares a strong correlation with one of the IV along with the DV.
>>        SO4     NO3     NH4     PBW
>> SO4     1           -0.0867     0.999   0.999
>> NO3     -0.0867   1     -0.0527 -0.0938
>> NH4     0.999   -0.0527   1     0.999
>> PBW     0.999   -0.0938  0.999  1
>>
>>
>> Aparna
>>
>> -----Original Message-----
>> From: Dimitri Liakhovitski [mailto:ld7631 at gmail.com]
>> Sent: Tuesday, April 21, 2009 9:02 AM
>> To: Vemuri, Aparna
>> Cc: r-help at r-project.org; David Winsemius
>> Subject: Re: [R] Fitting linear models
>>
>> I am not sure what the problem is.
>> I found no errors:
>>
>> data<-read.csv(file.choose())  # I had to change your file extension
>> to .csv first
>> dim(data)
>> names(data)
>>
>> lapply(data,function(x){sum(is.na(x))})
>> lm.model.1<-lm(PBW~SO4+NO3+NH4,data)
>> lm.model.2<-lm(PBW~SO4+NH4+NO3,data)
>> print(lm.model.1)  # Getting nice results
>> print(lm.model.2) # Getting same results
>>
>> # Another method (gets exactly the same results):
>> library(Design)
>> ols.model.1<-ols(PBW~SO4+NO3+NH4,data)
>> ols.model.2<-ols(PBW~SO4+NH4+NO3,data)
>>
>> Dimitri
>> On Tue, Apr 21, 2009 at 11:50 AM, Vemuri, Aparna <avemuri at epri.com> wrote:
>>> Attached are the first hundred rows of my data in comma separated format.
>>> Forcing the regression line through the origin, still does not give a coefficient on the last independent variable. Also, I don't mind if there is a coefficient on the dependent axis. I just want all of the variables to have coefficients in the regression equation or a at least a consistent result, irrespective of the order of input information.
>>>
>>> -----Original Message-----
>>> From: David Winsemius [mailto:dwinsemius at comcast.net]
>>> Sent: Tuesday, April 21, 2009 8:38 AM
>>> To: Vemuri, Aparna
>>> Cc: r-help at r-project.org
>>> Subject: Re: [R] Fitting linear models
>>>
>>>
>>> On Apr 21, 2009, at 11:12 AM, Vemuri, Aparna wrote:
>>>
>>>> David,
>>>> Thanks for the suggestions. No, I did not label my dependent
>>>> variable "function".
>>>
>>> That was from my error in reading your call to lm. In my defense I am
>>> reasonably sure the proper assignment to arguments is lm(formula= ...)
>>> rather than lm(function= ...).
>>>>
>>>>
>>>> My dependent variable PBW and all the independent variables are
>>>> continuous variables. It is especially troubling since the order in
>>>> which I input independent variables determines whether or not it
>>>> gets a coefficient.  Like I already mentioned, I checked the
>>>> correlation matrix and picked the variables with moderate to high
>>>> correlation with the independent variable. . So I guess it is not so
>>>> naïve to expect a regression coefficient on all of them.
>>>>
>>>> Dimitri
>>>> model1<-lm(PBW~SO4+NO3+NH4), gives me the same result as before.
>>>
>>> Did you get the expected results with;
>>> model1<-lm(formula=PBW~SO4+NO3+NH4+0)
>>>
>>> You could, of course, provide either the data or the results of str()
>>> applied to each of the variables and then we could all stop guessing.
>>>
>>>>
>>>> Aparna
>>>>
>>>>>
>>>>>
>>>>> I am using the lm() function in R to fit a dependent variable to a
>>>>> set
>>>>> of 3 to 5 independent variables. For this, I used the following
>>>>> commands:
>>>>>
>>>>>> model1<-lm(function=PBW~SO4+NO3+NH4)
>>>>> Coefficients:
>>>>> (Intercept)          SO4          NO3      NH4
>>>>>   0.01323      0.01968      0.01856           NA
>>>>>
>>>>> and
>>>>>
>>>>>> model2<-lm(function=PBW~SO4+NO3+NH4+Na+Cl)
>>>>>
>>>>> Coefficients:
>>>>> (Intercept)              SO4                  NO3      NH4
>>>>> Na       Cl
>>>>> -0.0006987   -0.0119750   -0.0295042    0.0842989    0.1344751
>>>>> NA
>>>>>
>>>>> In both cases, the last independent variable has a coefficient of NA
>>>>> in
>>>>> the result. I say last variable because, when I change the order of
>>>>> the
>>>>> variables, the coefficient changes (see below). Can anyone point me
>>>>> to
>>>>> the reason R behaves this way?  Is there anyway for me to force R to
>>>>> use
>>>>> all the variables? I checked the correlation matrices to makes sure
>>>>> there is no orthogonality between the variables.
>>>>
>>>> You really did not name your dependent variable "function" did you?
>>>> Please stop that.
>>>>
>>>> Just a guess, ... since you have not provided enough information to do
>>>> otherwise, ... Are all of those variables 1/0 dummy variables? If so
>>>> and if you want to have an output that satisfies your need for
>>>> labeling the coefficients as you naively anticipate, then put "0+" at
>>>> the beginning of the formula or "-1" at the end, so that the intercept
>>>> will disappear and then all variables will get labeled as you expect.
>>> --
>>> David Winsemius, MD
>>> Heritage Laboratories
>>> West Hartford, CT
>>>
>>>
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>> MarketTools, Inc.
>> Dimitri.Liakhovitski at markettools.com
>>
>
>
>
> --
> Dimitri Liakhovitski
> MarketTools, Inc.
> Dimitri.Liakhovitski at markettools.com
>



-- 
Dimitri Liakhovitski
MarketTools, Inc.
Dimitri.Liakhovitski at markettools.com




More information about the R-help mailing list