[R] SVM Param Tuning with using SNOW package

David Winsemius dwinsemius at comcast.net
Thu Nov 19 04:50:41 CET 2009


On Nov 18, 2009, at 12:35 PM, Max Kuhn wrote:

> On Tue, Nov 17, 2009 at 6:01 PM, raluca <ucagui at hotmail.com> wrote:
>>
>> Hello,
>>
>> Is the first time I am using SNOW package and I am trying to tune  
>> the cost
>> parameter for a linear SVM, where the cost (variable cost1) takes  
>> 10 values
>> between 0.5 and 30.
>>
>> I have a large dataset and a pc which is not very powerful, so I  
>> need to
>> tune the parameters using both CPUs of the pc.
>>
>> Somehow I cannot manage to do it. It seems that both CPUs are  
>> fitting the
>> model for the same values of cost1, I guess the first 5, but not  
>> for the
>> last 5.
>>
>> Please, can anyone help me! :-((
>
> This is pretty easy to do with the train() funciton in the caret
> package. From ?train, here is an example for a different data set
>
>> library(caret)
>> library(snow)
>> library(mlbench)
>>
>> data(BostonHousing)
>>
>> mpiCalcs <- function(X, FUN, ...)
> +   {
> +     theDots <- list(...)
> +     parLapply(theDots$cl, X, FUN)
> +   }
>>
>> library(snow)
>> cl <- makeCluster(5, "MPI")
>>
>> ## 50 bootstrap models distributed across 5 workers
>> mpiControl <- trainControl(workers = 5,
> +                            number = 50,
> +                            computeFunction = mpiCalcs,
> +                            computeArgs = list(cl = cl))
>> set.seed(1)
>> usingMPI <-  train(medv ~ .,
> +                    data = BostonHousing,
> +                    "svmLinear",
> +                    tuneGrid = data.frame(.C = seq(.5, 30, length =  
> 10)),
> +                    trControl = mpiControl)
>>
>> stopCluster(cl)
> [1] 1
>

Well, that _was_ interesting. I submitted this job modified to set the  
number of clusters and workers set to eight on a Mac Pro (with 8 cores  
and 16 GB) and watched the cpu usage as reported by Activity  
Monitor.app. The cpu activity is divided into system and user and over  
the course of that run (which took a several minutes) the system  
proportion gradually rose o about 75% of total.

Was it your expectation that this task was comparable in complexity to  
that offered by the OP?

And should I be looking for a tangible result? Looking at usingMPI  
with str() I see a 50 x 506 matrix, no it's a list, usingMPI%control 
$index, of integers as well as quite a bit of other material that  
looks like input and side-effects of the multi-processor activity or  
setup.

-- 
David
>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list