[R] Multiple regression in R - unstandardised coefficients a

(Ted Harding) ted.harding at wlandres.net
Mon Aug 22 18:30:31 CEST 2011


On 22-Aug-11 15:37:40, JC Matthews wrote:
> Hello,
> 
> I have a statistical problem that I am using R for, but I am
> not making sense of the results. I am trying to use multiple
> regression to explore which variables (weather conditions)
> have the greater effect on a local atmospheric variable.
> The data is taken from a database that has 20391 data points (Z1).
> 
> A simplified version of the data I'm looking at is given below,
> but I have a problem in that there is a disagreement in sign
> between the regression coefficients and the standardised regression
> coefficients. Intuitively I would expect both to be the same sign,
> but in many of the parameters, they are not.
> 
> I am aware that there is a strong opinion that using standardised 
> correlation coefficients is highly discouraged by some people,
> but I would nevertheless like to see the results. Not least
> because it has made me doubt the non-standardised values of B
> that R has given me.
> 
> The code I have used, and some of the data, is as follows (once
> the database has been imported from SQL, and outliers removed).
> 
> Z1sub  <- Z1[, c(2, 5, 7,11, 12, 13, 15, 16)]
> colnames(Z1sub) <- c("temp", "hum", "wind", "press", "rain", "s.rad", 
> "mean1", "sd1" )
> 
> attach(Z1sub)
> names(Z1sub)
> 
> 
> Model1d <- lm(mean1 ~ hum*wind*rain +  I(hum^2) + I(wind^2) + I(rain^2)
> )
> 
> summary(Model1d)
> 
> Call:
> lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) +
>     I(rain^2))
> 
> Residuals:
>      Min       1Q   Median       3Q      Max
> -1230.64   -63.17    18.51    97.85  1275.73
> 
> Coefficients:
>                 Estimate Std. Error t value Pr(>|t|)
> (Intercept)   -9.243e+02  5.689e+01 -16.246  < 2e-16 ***
> hum            2.835e+01  1.468e+00  19.312  < 2e-16 ***
> wind           1.236e+02  4.832e+00  25.587  < 2e-16 ***
> rain          -3.144e+03  7.635e+02  -4.118 3.84e-05 ***
> I(hum^2)      -1.953e-01  9.393e-03 -20.793  < 2e-16 ***
> I(wind^2)      6.914e-01  2.174e-01   3.181  0.00147 **
> I(rain^2)      2.730e+02  3.265e+01   8.362  < 2e-16 ***
> hum:wind      -1.782e+00  5.448e-02 -32.706  < 2e-16 ***
> hum:rain       2.798e+01  8.410e+00   3.327  0.00088 ***
> wind:rain      6.018e+02  2.146e+02   2.805  0.00504 **
> hum:wind:rain -6.606e+00  2.401e+00  -2.751  0.00594 **
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> Residual standard error: 180.5 on 20337 degrees of freedom
> Multiple R-squared: 0.2394,     Adjusted R-squared: 0.239
> F-statistic: 640.2 on 10 and 20337 DF,  p-value: < 2.2e-16
> 
> 
> 
> 
> 
> To calculate the standardised coefficients, I used the following:
> 
> Z1sub.scaled <- data.frame(scale( Z1sub[,c('temp', 'hum', 'wind',
> 'press', 
> 'rain', 's.rad', 'mean1', 'sd1' ) ] ) )
> 
> attach(Z1sub.scaled)
> names(Z1sub.scaled)
> 
> 
> Model1d.sc <- lm(mean1 ~ hum*wind*rain +  I(hum^2) + I(wind^2) +
> I(rain^2) )
> 
> summary(Model1d.scaled)
> 
> Call:
> lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) +
>     I(rain^2))
> 
> Residuals:
>      Min       1Q   Median       3Q      Max
> -5.94713 -0.30527  0.08946  0.47287  6.16503
> 
> Coefficients:
>                 Estimate Std. Error t value Pr(>|t|)
> (Intercept)    0.0806858  0.0096614   8.351  < 2e-16 ***
> hum           -0.4581509  0.0073456 -62.371  < 2e-16 ***
> wind          -0.1995316  0.0073767 -27.049  < 2e-16 ***
> rain          -0.1806894  0.0158037 -11.433  < 2e-16 ***
> I(hum^2)      -0.1120435  0.0053885 -20.793  < 2e-16 ***
> I(wind^2)      0.0172870  0.0054346   3.181  0.00147 **
> I(rain^2)      0.0040575  0.0004853   8.362  < 2e-16 ***
> hum:wind      -0.2188729  0.0066659 -32.835  < 2e-16 ***
> hum:rain       0.0267420  0.0146201   1.829  0.06740 .
> wind:rain      0.0365615  0.0122335   2.989  0.00281 **
> hum:wind:rain -0.0438790  0.0159479  -2.751  0.00594 **
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> Residual standard error: 0.8723 on 20337 degrees of freedom
> Multiple R-squared: 0.2394,     Adjusted R-squared: 0.239
> F-statistic: 640.2 on 10 and 20337 DF,  p-value: < 2.2e-16
> 
> 
> 
> So having, for instance for humidity (hum), B = 28.35 +/-  1.468, while
> Beta = -0.4581509 +/- 0.0073456 is concerning. Is this normal, or is
> there 
> an error in my code that has caused this contradiction?
> 
> Many thanks,
> 
> James.
> ----------------------
> JC Matthews
> School of Chemistry
> Bristol University

Hi,
without having your data, so unable to check, I would not be
surprised if the changes of sign were the outcome of your model
formula, in particular the 3-variable (2nd-order) interaction,
i.e. you are using a model which is non-linear in the variables
themselves. Let's just take that part of the model:

  lm(formula = mean1 ~ hum * wind * rain

This, in its quantitative expression, expands to:

  mean1 = C0 + C11*hum + C12*wind + C13*rain
             + C21*hum*wind + C22*hum*rain + C23*wind*rain
             + C31*hum*wind*rain

Suppose that is for the unstandardised variables. Now express
it in terms of standardised variables (initial capital letters):

  mean1 = C0 + C11*sd(hum)*(Hum + mean(hum)/sd(hum))
             + C12*sd(wind)*(Wind + mean(wind)/sd(wind))
             + C13*sd(rain)*(Rain + mean(rain)/sd(rain))

             + C21*sd(hum)*sd(wind)*
                   (Hum + mean(hum)/sd(hum))*(Wind + mean(wind)/sd(wind))

             + C22*sd(hum)*sd(rain)*
                   (Hum + mean(hum)/sd(hum))*(Rain + mean(rain)/sd(rain))

             + C23*sd(wind)*sd(rain)*
                   (Wind + mean(wind)/sd(wind))*
                   (Rain + mean(rain)/sd(rain))

             + C31*sd(hum)*sd(wind)*sd(rain)*
                 (Hum + mean(hum)/sd(hum))*
                 (Wind + mean(wind)/sd(wind))*
                 (Rain + mean(rain)/sd(rain))

Now pick out, say, the coefficient of 'Hum' in this latter expression
(i.e. all the terms which involve 'Hum' but neither 'Wind' nor 'Rain'):

  C11*sd(hum)
+ C21*sd(hum)*sd(wind)*mean(wind)/sd(wind)
+ C22*sd(hum)*sd(rain)*mean(rain)/sd(rain)
+ C31*sd(hum)*sd(wind)*sd(rain)*
      (mean(wind)/sd(wind))*(mean(rain)/sd(rain))

= C11*sd(hum)
+ C21*sd(hum)*mean(wind)
+ C22*sd(hum)*mean(rain)
+ C31*sd(hum)*mean(wind)*mean(rain)

So there is no reason to expect this to have even the same sign
as the original C11, the coefficient of 'hum', let alone any more
specific relationship with it!

Hoping this helps,
Ted.



--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 22-Aug-11                                       Time: 17:30:29
------------------------------ XFMail ------------------------------



More information about the R-help mailing list