[R] Comparing two regression lines

Wed Jun 4 15:15:19 CEST 2008

Dear Christoph,

If I've understood properly what you intend to do, no, it doesn't make
sense. I assume from your description that you don't have two independent
samples, but rather you really have n observations and that the variables x,
y1, and y2, are measured on the same individuals. If that's correct, then
typically y1 and y2 would not be independent.

If my understanding is correct, then you should be able to do what you want
by fitting a multivariate regression and testing for the difference of the
two slopes. Of course, it must make sense to compare the slopes -- e.g., y1
and y2 must be measured in the same units. 

Here's a nonsense example illustrating how you could proceed using the
linear.hypothesis() function in the car package:

> library(car)
> mod <- lm(cbind(income, education) ~ prestige, data=Duncan)
> Anova(mod)

Type II MANOVA Tests: Pillai test statistic
         Df test stat approx F num Df den Df    Pr(>F)    
prestige  1     0.828  101.216      2     42 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

> linear.hypothesis(mod, "prestige", P=matrix(c(-1, 1), 2, 1))

 Response transformation matrix:
          [,1]
income      -1
education    1

Sum of squares and products for the hypothesis:
         [,1]
[1,] 1048.684

Sum of squares and products for error:
         [,1]
[1,] 17828.96

Multivariate Tests: 
                 Df test stat  approx F num Df den Df  Pr(>F)
Pillai            1 0.0555517 2.5292235      1     43 0.11908
Wilks             1 0.9444483 2.5292235      1     43 0.11908
Hotelling-Lawley  1 0.0588192 2.5292235      1     43 0.11908
Roy               1 0.0588192 2.5292235      1     43 0.11908

Note that in this case the "multivariate" test is really univariate.

If you apply your approach to this example, you get:

> attach(Duncan)
> y <- c(income, education)
> x <- c(prestige, prestige)
> f <- factor(rep(c("income", "education"), c(45, 45)))
> mod.univ <- lm(y ~ x*f)
> Anova(mod.univ)
Anova Table (Type II tests)

Response: y
          Sum Sq Df F value    Pr(>F)    
x          46199  1 214.549 < 2.2e-16 ***
f           2571  1  11.938 0.0008563 ***
x:f          524  1   2.435 0.1223237    
Residuals  18519 86                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The sum of squares for the x:f interaction is exactly half the SS for the
hypothesis of equal slopes from the multivariate linear model, but the sum
of squares for error is off. The net result is a test statistic that is
slightly off (and has incorrect df).

I hope this helps,
 John

------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
> Behalf Of Christoph Scherber
> Sent: June-04-08 7:37 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Comparing two regression lines
> 
> Dear R users,
> 
> Suppose I have two different response variables y1, y2 that I regress
> separately on the same
> explanatory variable, x; sample sizes are n1=n2.
> 
> Is it legitimate to compare the regression slopes (equal variances
assumed)
> by using
> 
> lm(y~x*FACTOR),
> 
> where FACTOR gets "y1" if y1 is the response, and "y2" if y2 is the
response?
> 
> The problem I see here is that the residual degrees of freedom obviously
go
> up (n1-1+n2-1) although
> there were only n1=n2 "true" replications.
> 
> On the other hand, Zar (1984) and some other statistics textbooks base
their
> calculations on a
> modified version of the t test (mainly using the residual SS from the two
> regressions), but when I
> calculated these tests by hand I found that the t-values are almost
identical
> to the one I get using
> the lm(...) approach.
> 
> Part of the R script is appended below.
> 
> How would you proceed?
> 
> Many thanks for your help!
> 
> Best wishes,
> Christoph
> 
> ##
> 
> lm1=lm(y1~x) # y1 and y2 are different response variables, scaled to [0;1]
> using a ranging transform
> lm2=lm(y2~x) # n is 12 for each regression
> 
> model3=lm(y~x*FACTOR)
> summary(model3)
> 
> sxx1=sum((y1-mean(y1))^2)
> sxx2=sum((y2-mean(y2))^2)
> 
> s.pooled=(4805.2+6946.6)/20
> 
> SED=sqrt(s.pooled*(1/sxx1+1/sxx2))
> 
> t=(528.54-446.004)/SED #residual SS from lm1 and lm2 divided by SED
> qt(0.025,20) #compare with t distribution at 20 d.f.
> 
> 
> ##
> #(using R 2.6.2 on Windows XP)
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.