[R] r code to generate interaction columns

Sharma, Dhruv Dhruv.Sharma at PenFed.org
Mon Mar 8 15:50:53 CET 2010


 thanks Kieth.  I wanted something generic code to check column data
type and loop through and create the interaction columns automatically
as I want to test this out as a new algorithm for data mining.

Traditional regression may give misleading results with
multi-collinearity and thus I wanted to take interaction terms and run
them through random forests and rpart as they would need interaction
terms to be manually created.

Hope that clarifies.

Dhruv

-----Original Message-----
From: kMan [mailto:kchamberln at gmail.com] 
Sent: Sunday, March 07, 2010 8:08 PM
To: Sharma, Dhruv; r-help at r-project.org
Subject: RE: [R] r code to generate interaction columns

Dear Dhruv,

You could create interaction variables manually (assuming A is your
dependent variable). Just multiply the variables together.
cd.int<-C*D
ce.int<-C*E
cde.int<-C*D*E # what about D*E, or interactions with B?
Include those in your model, such as
A~B+C+D+E+cd.int+cd.int+ce.int+cde.int.
Then you can compare those models to the results you get when you
specify the interaction in the model formula directly using the
documented syntax.
In your R-console, type ?formula, or help("formula") for details. 

Sincerely,
KeithC.


-----Original Message-----
From: Sharma, Dhruv [mailto:Dhruv.Sharma at PenFed.org]
Sent: Saturday, March 06, 2010 10:30 AM
To: r-help at r-project.org
Subject: [R] r code to generate interaction columns

Hi,
   is there a way to take a dataset and extract numeric columns and
create interaction columns from it automatically?

   For e.g.  there are 5 columns of data: A,B,C,D,E.

   CDE are numeric.

   Can someone provide code to automatically create more columns such
as:

   1) C*D, C*E, C*D*E, (C+E)/(D+.01 (to avoid divide by zero),
(D+E)/(C+.01 (to avoid divide by zero), (C+D)/(E+.01 (to avoid divide by
zero))

?

I know in glm multiplying can create terms but i want the columns to be
part of the data set so that i can feed this into Random forest to pick
out predictive interaction terms as regression cannot reliably handle
correlated interaction terms.

if anyone has some simple code that can do this that would be helpful.

thanks
Dhruv
    

	[[alternative HTML version deleted]]



More information about the R-help mailing list