[R] Parallelizing GBM

Mxkuhn mxkuhn at gmail.com
Sun Mar 24 15:22:27 CET 2013


Yes, I think the second link is a test build of a parallelized cv loop within gbm(). 


On Mar 24, 2013, at 9:28 AM, "Lorenzo Isella" <lorenzo.isella at gmail.com> wrote:

> Thanks a lot for the quick answer.
> However, from what I see, the parallelization affects only the cross-validation part in the gbm interface (but it changes nothing when you call gbm.fit).
> Am I missing anything here?
> Is there any fundamental reason why gbm.fit cannot be parallelized?
> 
> Lorenzo
> 
> 
> 
> On Sun, 24 Mar 2013 12:45:39 +0100, Max Kuhn <mxkuhn at gmail.com> wrote:
> 
>> See this:
>> 
>>  https://code.google.com/p/gradientboostedmodels/issues/detail?id=3
>> 
>> 
>> and this:
>> 
>>  https://code.google.com/p/gradientboostedmodels/source/browse/?name=parallel
>> 
>> 
>> 
>> Max
>> 
>> 
>> On Sun, Mar 24, 2013 at 7:31 AM, Lorenzo Isella <lorenzo.isella at gmail.com> wrote:
>> 
>>> Dear All,
>>> 
>>> I am far from being a guru about parallel programming.
>>> 
>>> Most of the time, I rely or randomForest for data mining large datasets.
>>> 
>>> I would like to give a try also to the gradient boosted methods in GBM, but I have a need for parallelization.
>>> 
>>> I normally rely on gbm.fit for speed reasons, and I usually call it this way
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> gbm_model <- gbm.fit(trainRF,prices_train,
>>> 
>>> offset = NULL,
>>> 
>>> misc = NULL,
>>> 
>>> distribution = "multinomial",
>>> 
>>> w = NULL,
>>> 
>>> var.monotone = NULL,
>>> 
>>> n.trees = 50,
>>> 
>>> interaction.depth = 5,
>>> 
>>> n.minobsinnode = 10,
>>> 
>>> shrinkage = 0.001,
>>> 
>>> bag.fraction = 0.5,
>>> 
>>> nTrain = (n_train/2),
>>> 
>>> keep.data = FALSE,
>>> 
>>> verbose = TRUE,
>>> 
>>> var.names = NULL,
>>> 
>>> response.name = NULL)
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Does anybody know an easy way to parallelize the model (in this case it means simply having 4 cores on the same >>machine working on the problem)?
>>> 
>>> Any suggestion is welcome.
>>> 
>>> Cheers
>>> 
>>> 
>>> 
>>> Lorenzo
>>> 
>>> 
>>> 
>>> ______________________________________________
>>> 
>>> R-help at r-project.org mailing list
>>> 
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> 
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> 
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>> 
>> 
>> --
>> Max



More information about the R-help mailing list