[R] Seeking help for outomating regression (over columns) and storing selected output

Gabor Grothendieck ggrothendieck at myway.com
Sat Apr 3 18:45:11 CEST 2004



Note that there is a QUESTION at the end regarding
random effects.

Suppose your data frame is df and has components 
y, x1, x2, x3 and u where u is a factor.  

1. There was a problem posted about doing repeated regressions 
(search for Operating on windows of data) last month that 
has similarities to this one.  

Making use of those ideas, the first sapply below loops 
over the y~xi regressions and the next two loop over 
the usergroup specific regressions.  We just rbind 
them altogether:

xvars <- c("x1", "x2", "x3")
rbind(
   sapply( xvars, function(xi) coef( lm(y ~ df[,xi], data=df))[[2]] ), 
   sapply( xvars, function(xi)
        sapply( levels(df$u), function(ulev)
		coef(lm(y ~ df[,xi], subset=u==ulev, data=df))[[2]]
	)
   )
)


2. Another possibility is to create a giant regression that does 
all the usergroup specific regressions at once and then repeat 
it without the usergroup variable to get the rest.  

df2 is a new data frame that strings out all the x variables into 
a single long column and adds a new factor i that identifies
which x variable it is.  y and u are repeated three times to bring 
them into line with x.  (

xvars <- c("x1", "x2", "x3")
xm <- as.matrix(df[,xvars])
df2 <- data.frame(y=rep(df$y,3), x = c(xm), i=factor(c(col(xm))), u=rep(u,3))

# We could have alternately used reshape like this:
# df2 <-  reshape(df,timevar="i",times=factor(1:3),
#                varying=list(xvars),direction="long",v.name="x")

# The slopes by usergroup and across user group are:

coeff.u <- coef(lm(y ~ i/u/x, data=df2))
coeff.all <- coef(lm(y ~ i/x, data=df2))

# Pick off the slopes (they are at the end of each coef vector) and reform:

z <- matrix( c( matrix( coef.all, nc=2)[,2], matrix( coef.u, nc=2)[,2] ), nc=3)
colnames(z) <- xvars
rownames(z) <- c("All", levels(df$u))

3. Note that the giant regression approach works as long as you are only
interested in the coefficients, however, if you were interested in the
variances then this would not work since each of the two regressions uses a
pooled estimate of variance.

QUESTION:  As a matter of interest, would someone that is familiar with random
effects models show what the corresponding giant model is with separate
variances for each regression.


P.S. I tried the above out on the following which is similar
to the original problem except there are 4 levels in u:

data(state)
x <- state.x77[,1:3]
u <- state.region
y <- state.x77[,4]
df <- data.frame(y=y, x1=x[,1], x2=x[,2], x3=x[,3], u=factor(u))



Greg Blevins <gblevins <at> mn.rr.com> writes:

: 
: Hello, 
: 
: I have spent considerable time trying to figure out that which I am about to 
describe.  This included
: searching Help, consulting my various R books, and trail and (always) 
error.  I have been assuming I would
: need to use a loop (looping over columns) but perhaps and apply function 
would do the trick.  I have
: unsuccessfully tried both.
: 
: A scaled down version of my situation is as follows:
: 
: I have a dataframe as follows:
: 
: ID       Y      x1          x2          x3           usergroup.
: 
: Y is a continous criterion, x1-x3 continous predictors, and usergroup is 
coded a 1, 2 or 3 to indicate user status.
: 
: My end goal is a (dataframe or matrix) with just the regression coef from 
each of 12 runs (each x regressed
: separately on Y for the total sample and for each usergroup).  I envision 
output as follows, a three column
: by four row dataframe or matrix.
: 
:                          Y and x1;            Y and x2;         Y and x3.
: Total sample:
: usergroup 1:               
: usergroup 2:               (Regression Coefs fill the matrix) 
: usergroup 3:              
: 
: Using 1.8.1
: Windows 2000 and XP
: 
: Help would be most appreciated.
: 
: Greg Blevins, Partner
: The Market Solutions Group
: 	[[alternative HTML version deleted]]




More information about the R-help mailing list