[R] Multiple regression in R - unstandardised coefficients are a different sign to standardised coefficients, is this correct?

Ista Zahn izahn at psych.rochester.edu
Mon Aug 22 18:02:03 CEST 2011


Hi JC,

You have interactions in your model, which means that your models
specifies that the coefficients for hum, wind, and rain should vary
depending on the value of the other two (and depending on their own
value actually, since you also have quadratic effects for each of
these variables in your model). Since these coefficients are varying
according to the model, it is impossible to specify their value
unconditionally. The values you are seeing are therefore conditional
estimates that at particular values on the variables with which each
predictor interacts. Since you've changed the distribution of those
variables by standardizing them, you get different conditional
estimates.

All this will be covered in most regression textbooks.

Best,
Ista

On Mon, Aug 22, 2011 at 11:37 AM, JC Matthews
<J.C.Matthews at bristol.ac.uk> wrote:
>
> Hello,
>
> I have a statistical problem that I am using R for, but I am not making
> sense of the results. I am trying to use multiple regression to explore
> which variables (weather conditions) have the greater effect on a local
> atmospheric variable. The data is taken from a database that has 20391 data
> points (Z1).
>
> A simplified version of the data I'm looking at is given below, but I have a
> problem in that there is a disagreement in sign between the regression
> coefficients and the standardised regression coefficients. Intuitively I
> would expect both to be the same sign, but in many of the parameters, they
> are not.
>
> I am aware that there is a strong opinion that using standardised
> correlation coefficients is highly discouraged by some people, but I would
> nevertheless like to see the results. Not least because it has made me doubt
> the non-standardised values of B that R has given me.
>
> The code I have used, and some of the data, is as follows (once the database
> has been imported from SQL, and outliers removed).
>
>
>
> Z1sub  <- Z1[, c(2, 5, 7,11, 12, 13, 15, 16)]
> colnames(Z1sub) <- c("temp", "hum", "wind", "press", "rain", "s.rad",
> "mean1", "sd1" )
>
> attach(Z1sub)
> names(Z1sub)
>
>
> Model1d <- lm(mean1 ~ hum*wind*rain +  I(hum^2) + I(wind^2) + I(rain^2) )
>
> summary(Model1d)
>
> Call:
> lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) +
>   I(rain^2))
>
> Residuals:
>    Min       1Q   Median       3Q      Max
> -1230.64   -63.17    18.51    97.85  1275.73
>
> Coefficients:
>               Estimate Std. Error t value Pr(>|t|)
> (Intercept)   -9.243e+02  5.689e+01 -16.246  < 2e-16 ***
> hum            2.835e+01  1.468e+00  19.312  < 2e-16 ***
> wind           1.236e+02  4.832e+00  25.587  < 2e-16 ***
> rain          -3.144e+03  7.635e+02  -4.118 3.84e-05 ***
> I(hum^2)      -1.953e-01  9.393e-03 -20.793  < 2e-16 ***
> I(wind^2)      6.914e-01  2.174e-01   3.181  0.00147 **
> I(rain^2)      2.730e+02  3.265e+01   8.362  < 2e-16 ***
> hum:wind      -1.782e+00  5.448e-02 -32.706  < 2e-16 ***
> hum:rain       2.798e+01  8.410e+00   3.327  0.00088 ***
> wind:rain      6.018e+02  2.146e+02   2.805  0.00504 **
> hum:wind:rain -6.606e+00  2.401e+00  -2.751  0.00594 **
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 180.5 on 20337 degrees of freedom
> Multiple R-squared: 0.2394,     Adjusted R-squared: 0.239
> F-statistic: 640.2 on 10 and 20337 DF,  p-value: < 2.2e-16
>
>
>
>
>
> To calculate the standardised coefficients, I used the following:
>
> Z1sub.scaled <- data.frame(scale( Z1sub[,c('temp', 'hum', 'wind', 'press',
> 'rain', 's.rad', 'mean1', 'sd1' ) ] ) )
>
> attach(Z1sub.scaled)
> names(Z1sub.scaled)
>
>
> Model1d.sc <- lm(mean1 ~ hum*wind*rain +  I(hum^2) + I(wind^2) + I(rain^2) )
>
> summary(Model1d.scaled)
>
> Call:
> lm(formula = mean1 ~ hum * wind * rain + I(hum^2) + I(wind^2) +
>   I(rain^2))
>
> Residuals:
>    Min       1Q   Median       3Q      Max
> -5.94713 -0.30527  0.08946  0.47287  6.16503
>
> Coefficients:
>               Estimate Std. Error t value Pr(>|t|)
> (Intercept)    0.0806858  0.0096614   8.351  < 2e-16 ***
> hum           -0.4581509  0.0073456 -62.371  < 2e-16 ***
> wind          -0.1995316  0.0073767 -27.049  < 2e-16 ***
> rain          -0.1806894  0.0158037 -11.433  < 2e-16 ***
> I(hum^2)      -0.1120435  0.0053885 -20.793  < 2e-16 ***
> I(wind^2)      0.0172870  0.0054346   3.181  0.00147 **
> I(rain^2)      0.0040575  0.0004853   8.362  < 2e-16 ***
> hum:wind      -0.2188729  0.0066659 -32.835  < 2e-16 ***
> hum:rain       0.0267420  0.0146201   1.829  0.06740 .
> wind:rain      0.0365615  0.0122335   2.989  0.00281 **
> hum:wind:rain -0.0438790  0.0159479  -2.751  0.00594 **
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.8723 on 20337 degrees of freedom
> Multiple R-squared: 0.2394,     Adjusted R-squared: 0.239
> F-statistic: 640.2 on 10 and 20337 DF,  p-value: < 2.2e-16
>
>
>
> So having, for instance for humidity (hum), B = 28.35 +/-  1.468, while Beta
> = -0.4581509 +/- 0.0073456 is concerning. Is this normal, or is there an
> error in my code that has caused this contradiction?
>
> Many thanks,
>
> James.
>
>
> ----------------------
> JC Matthews
> School of Chemistry
> Bristol University
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org



More information about the R-help mailing list