[R] Caret and Model Prediction

Sun Oct 5 23:45:05 CEST 2014

Yes, you should train 5 different models, or find an outcome that can
combine them together, since caret only accepts a list or vector as
outcome.

On Sun, Oct 5, 2014 at 1:51 PM, Lorenzo Isella <lorenzo.isella at gmail.com>
wrote:

> Thanks a lot.
> At this point then I wonder: seen that my response consists of 5
> outcomes for each set of features, should I then train 5 different
> models (one for each of them)?
> Cheers
>
> Lorenzo
>
>
> On Sun, Oct 05, 2014 at 11:04:01AM -0700, Jia Xu wrote:
>
>> Hi, Lorenzo:
>>  For 1) I think the formula is not correct. The formula should be outcome
>> ~ features, and that's why you have weird result in 3)
>>    2) predict in caret will automatically find the best result one if
>> there is one(sometimes it fails). You can print the model to see the cross
>> validation result. Furthermore, you may specify the performance metric you
>> want to find the optimal result. Please see the details of the caret
>> tutorial to see how to.
>>
>> On Sun, Oct 5, 2014 at 8:54 AM, Lorenzo Isella <lorenzo.isella at gmail.com>
>> wrote:
>>
>>  Dear All,
>>> I am learning the ropes of CARET for automatic model training, more or
>>> less following the steps of the tutorial at
>>>
>>> http://bit.ly/ZJQINa
>>>
>>> However, there are a few things about which I would like a piece of
>>> advice.
>>>
>>> Consider for instance the following model
>>>
>>> #############################################################
>>>
>>> set.seed(825)
>>>
>>> fitControl <- trainControl(## 10-fold CV
>>>                           method = "repeatedcv",
>>>                           number = 10,
>>>                           ## repeated ten times
>>>                           repeats = 10)
>>>
>>> gbmGrid <-  expand.grid(interaction.depth = c(1, 5, 9),
>>>                        n.trees = (1:30)*50,
>>>                        shrinkage = 0.05)
>>>
>>> nrow(gbmGrid)
>>>
>>> gbmFit <- train(Ca+P+pH+SOC+Sand~ ., data = training,
>>>                 method = "gbm",
>>>                 trControl = fitControl,
>>>                 ## This last option is actually one
>>>                 ## for gbm() that passes through
>>>                 verbose = TRUE,
>>> ## Now specify the exact models                 ## to evaludate:
>>>                 tuneGrid = gbmGrid
>>>                 )
>>>
>>> #############################################################
>>>
>>> I am trying to tune a model that predicts the values of 5 columns
>>> whose names are "Ca","P","pH", "SOC", and "Sand".
>>>
>>> 1) Am I using the formula syntax in a correct way?
>>>
>>> I then try to apply my model on the test data by coding
>>>
>>> mypred <- predict(gbmFit, newdata=test)
>>>
>>> However, at this point I am left with a couple of questions
>>>
>>> 2) does "predict" automatically select the best tuned model in gbmFit?
>>> and if not, what am I supposed to do?
>>> 3) I do not get any error messages, but mypred consists of a single
>>> column instead of 5 columns corresponding to the 5 variables I am
>>> trying to predict, so something is obviously wrong (see point 1). Any
>>> suggestions here?
>>>
>>> Many thanks
>>>
>>> Lorenzo
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>> --
>> Jia Xu
>>
>

-- 
Jia Xu

	[[alternative HTML version deleted]]