[R] Formula in a model

Paulito Palmes ppalmes at yahoo.com
Wed Sep 11 16:26:54 CEST 2013



Hello Gerrit, 

Thanks for the explanation. Let me give a specific example.

Assume Temp (column 4) is the output and the rest of the columns are input is the training features. Note that I only use the air quality data for illustration purpose. T input->output mapping may not make sense in the real interpretation of this data.

library(e1071)

data(airquality)
mytable=airquality

colnames(mytable)=c('a','b','c','d','e','f')


modelSVM1=svm(mytable[,6] ~ .,data=mytable)
modelSVM2=svm(mytable[,-6],mytable[,6])
modelSVM3=svm(f ~ ., data=mytable)

predSVM1=predict(modelSVM1,newdata=mytable)
predSVM2=predict(modelSVM2,newdata=mytable[,-6])
predSVM3=predict(modelSVM3,newdata=mytable)

Results of predSVM2 is similar with predSVM3  but different from predSVM1.

Question: Which is the correct formulation? Why R doesn't detect error/discrepancy in formulation?


If I use the same formulation with rpart using the same data:

library(rpart)

data(airquality)
mytable=airquality

colnames(mytable)=c('a','b','c','d','e','f')

modelRP1=rpart(mytable[,6]~.,data=mytable,method='anova') # this works
modelRP3=rpart(f ~ ., data=mytable,method='anova') # this works

predRP1=predict(modelRP1,newdata=mytable)
predRP3=predict(modelRP3,newdata=mytable)



The results between predRP1 and predRP3 are different while the statements:

predRP2=predict(modelRP2,newdata=mytable[,-6])
modelRP2=rpart(mytable[,-6],mytable[,6],method='anova') 


have errors.



_____________________
From: Gerrit Eichner <Gerrit.Eichner at math.uni-giessen.de>
To: Paulito Palmes <ppalmes at yahoo.com> 
Cc: "r-help at r-project.org" <r-help at r-project.org> 
Sent: Wednesday, 11 September 2013, 10:48
Subject: Re: [R] Formula in a model


Hello, Paulito,

first, I think you haven't received an answer yet because you did not 
"provide commented, minimal, self-contained, reproducible code" as the 
posting guide does request it from you.

Second, see inline below.

On Wed, 11 Sep 2013, Paulito Palmes wrote:

> Hi,
>
> I have a data.frame with dimension 336x336 called *training*, and 
> another one called *observation* which is 336x1. I combined them as one 
> table using table=data.frame(training, observation). table now has 
> 336x337 dimension with the last column as the observation to learn using 
> the training data of the rest of the column in the table. For 
> prediction, i combined the testing data and observation and pass it like 
> predict(model,testingWTesingObservation)
>
>
> I've used the formula: rpart(table[,337] ~ ., data=table) or 
> svm(table[,337] ~ ., data=table).

I am not familiar with rpart() nor with svm() but "table[,337] ~ ., data = 
table" has the consequence that table[,337] is also in the right hand side 
of the formula, so that your "observations" are also in the "training" 
data. That doesn't seem to make sense to me, and is different from the 
call to svm() below.

  Hth  --  Gerrit

> I recently discovered that this formulation produces different model 
> from the: svm(training, observation) formulation. Which is correct and 
> why one of them is not correct? I thought that syntactically, both are 
> the same. I hope that R should be able to detect the error in one of the 
> formulation to avoid the possibility of using it.
>
> Regards,
> Paul
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list