[R] Regression with many independent variables

Matthew Douglas matt.douglas01 at gmail.com
Mon Feb 28 21:32:02 CET 2011


Hi,

I am trying use lm() on some data, the code works fine but I would
like to use a more efficient way to do this.

The data looks like this (the data is very sparse with a few 1s, -1s
and the rest 0s):

> head(adj0708)
      MARGIN Poss P235 P247 P703 P218 P430 P489 P83 P307 P337....
1   64.28571   29    0    0    0    0    0    0   0    0    0    0
0    0    0
2 -100.00000    6    0    0    0    0    0    0   0    1    0    0
0    0    0
3  100.00000    4    0    0    0    0    0    0   0    1    0    0
0    0    0
4  -33.33333    7    0    0    0    0    0    0   0    0    0    0
0    0    0
5  200.00000    2    0    0    0    0    0    0   0    0    0    0
-1    0    0
6  -83.33333   12    0    -1    0    0    0    0   0    0    0    0
0    0    0

adj0708 is actually a 35657x341 data set. Each column after "Poss" is
an independent variable, the dependent variable is "MARGIN" and it is
weighted by "Poss"


The regression is below:
fit.adj0708 <- lm( adj0708$MARGIN~adj0708$P235 + adj0708$P247 +
adj0708$P703 + adj0708$P430 + adj0708$P489 + adj0708$P218 +
adj0708$P605 + adj0708$P337 + .... +
adj0708$P510,weights=adj0708$Poss)

I have two questions:

1. Is there a way to to condense how I write the independent variables
in the lm(), instead of having such a long line of code (I have 339
independent variables to be exact)?
2. I would like to pair the data to look a regression of the
interactions between two independent variables. I think it would look
something like this....
fit.adj0708 <- lm( adj0708$MARGIN~adj0708$P235:adj0708$P247 +
adj0708$P703:adj0708$P430 + adj0708$P489:adj0708$P218 +
adj0708$P605:adj0708$P337 + ....,weights=adj0708$Poss)
but there will be 339 Choose 2 combinations, so a lot of independent
variables! Is there a more efficient way of writing this code. Is
there a way I can do this?

Thanks,
Matt



More information about the R-help mailing list