[R] glm expand model to more values

Jarek Jasiewicz jarekj at amu.edu.pl
Sat Jan 12 21:48:22 CET 2008


Charles Annis, P.E. wrote:
> Jarek:
>
> Although it is not universally agreed on, I believe the first step in any
> data analysis is to PLOT YOUR DATA.
>
> dd <- data.frame(a=c(1, 2, 3, 4, 5, 6), b=c(3,  5,  6,  7,  9, 10))
> plot(b ~ a, data=dd)
> simple.model <- lm(b~a,data=dd)
> abline(simple.model)
>
> Why to you think you need a cubic model to describe 6 observations?
>
> Your model is overparameterized - it has two more parameters than the number
> of observations can reasonably justify, something that would be obvious from
> your plot.
>
> The summary of the simple.linear model shows both the intercept and the
> slope are statistically meaningful.  (That's what the asterisks mean.)
>
> Call:
> lm(formula = b ~ a, data = dd)
>
> Residuals:
>        1        2        3        4        5        6 
> -0.23810  0.39048  0.01905 -0.35238  0.27619 -0.09524 
>
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)    
> (Intercept)  1.86667    0.30132   6.195  0.00345 ** 
> a            1.37143    0.07737  17.725 5.95e-05 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
>
> Residual standard error: 0.3237 on 4 degrees of freedom
> Multiple R-Squared: 0.9874,     Adjusted R-squared: 0.9843 
> F-statistic: 314.2 on 1 and 4 DF,  p-value: 5.952e-05
>
> I think you should invest a small amount of your time, and an even smaller
> amount of your money to purchase and read - cover-to-cover - one of the
> several very good books on elementary statistics and R.  My recommendation
> is _Introductory Statistics with R_ by Peter Dalgaard (Paperback - Jan 9,
> 2004).  Amazon.com carries it.
>
> Best wishes.
>
>
>
> Charles Annis, P.E.
>
> Charles.Annis at StatisticalEngineering.com
> phone: 561-352-9699
> eFax:  614-455-3265
> http://www.StatisticalEngineering.com
>  
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
> Behalf Of Jarek Jasiewicz
> Sent: Saturday, January 12, 2008 2:06 PM
> To: Charles.Annis at statisticalengineering.com
> Cc: R-help at r-project.org
> Subject: Re: [R] glm expand model to more values
>
> Charles Annis, P.E. wrote:
>   
>> How many parameters are you trying to estimate?  How many observations do
>> you have?
>>
>> What is wrong is that half of your parameter estimates are statistically
>> meaningless:
>>
>> dd <- data.frame(a=c(1, 2, 3, 4, 5, 6), b=c(3,  5,  6,  7,  9, 10))
>>
>> overparameterized.model <- glm(b~poly(a,3),data=dd)
>>
>> summary(overparameterized.model)
>>
>>
>> Coefficients:
>>             Estimate Std. Error t value Pr(>|t|)    
>>
>> (Intercept)   6.6667     0.1725  38.644 0.000669 ***
>>
>> poly(a, 3)1   5.7371     0.4226  13.576 0.005382 ** 
>>
>> poly(a, 3)2  -0.1091     0.4226  -0.258 0.820395    
>>
>> poly(a, 3)3   0.2236     0.4226   0.529 0.649562  
>>
>>
>>
>>
>> Charles Annis, P.E.
>>
>> Charles.Annis at StatisticalEngineering.com
>> phone: 561-352-9699
>> eFax:  614-455-3265
>> http://www.StatisticalEngineering.com
>>  
>>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
>>     
> On
>   
>> Behalf Of Jarek Jasiewicz
>> Sent: Saturday, January 12, 2008 11:50 AM
>> To: R-help at r-project.org
>> Subject: [R] glm expand model to more values
>>
>> Hi
>>
>> I have the problem with fitting curve to data with lm and glm. When I 
>> use polynominal dependiency, fitted values from model are OK, but I 
>> cannot  recive proper values when I use coefficents to caltulate this.  
>> Let me present simple example:
>>
>> I have simple data.frame: (dd)
>>  a: 1 2 3 4 5 6
>>  b:  3  5  6  7  9 10
>>
>> I try to fit it to model:
>>
>> model=glm(b~poly(a,3),data=dd)
>>  I have following data fitted to model (as I expected)
>>  > fitted(model)
>>         1         2         3         4         5         6
>>  3.095238  4.738095  6.095238  7.333333  8.619048 10.119048
>>
>> and coef(model)
>> (Intercept) poly(a, 3)1 poly(a, 3)2 poly(a, 3)3
>>   6.6666667   5.7370973  -0.1091089   0.2236068
>>
>> so when I try to expand the model to other data (simple extrapolation), 
>> let say: s=seq(1:10,by=1)
>>
>> I do:
>> extra=sapply(s,function(x) coef(model) %*% x^(0:3))
>> and here is result:
>> [1]  12.51826  19.49328  28.93336  42.18015  60.57528  85.46040 118.17714
>>  [8] 160.06715 212.47207 276.73354
>>
>> the data form expanding coefs are completly differnd from fitted
>>
>> What's going wrong?
>>
>> Jarek
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>>     
> http://www.R-project.org/posting-guide.html
>   
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>   
>>     
> sorry but I cannot understand. What does it means data are statistically 
> meanningless?
>
> It is examle with very simple data which I use according to simpleR 
> manual example to check why I cannot recive expected result. I need 
> simple model y~x^3+x^2....+z to extrapolate data
> Jarek
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>   
I understand that data are not well example. But I try to find rather 
general solution.
Original data are list 98 dataframes and are calculated by over 100 
lines R script I thought that it is too much to attach them, so I typed 
few digits to ilustrate problem.

The question was asked wrong. It shoud be:

if formulas:
pol3_model=lm(b~poly(a,3))
p3_model=lm(b~a+I(a^2)+I(a^3))

 are the same? according R documetation - Yes
both gives the same fitted() values, but completly different coef()




More information about the R-help mailing list