[R] SVM Param Tuning with using SNOW package

Wed Nov 18 23:03:23 CET 2009

On Nov 18, 2009, at 4:21 PM, raluca wrote:

>
> Hi David,
>
> I have no idea what "magic" you did, but running exactly the same  
> code as
> you, I have the same problem as before, meaning that I get results  
> that are
> identical from 2 in 2, while I should get diffrent results for each  
> value of
> cost1 (which is a vector with 10 values running between 0.5 and 30)

Maybe your should post more details about the hardware? Magic?  I am  
not particularly experienced with parallel process. All I did was read  
the help pages and make a couple of changes that appeared better at  
matching what the functions specified and the samples illustrated.  
This is actually the first parallel code that I have gotten to run.

> This is the result I get.
>
> 0.2197162, 0.2197162,  0.1467448,  0.1467448,  0.2247955,  0.2247955,
> 0.1073280, 0.1073280 0.2332475, 0.2332475
>
> Anyway, thanks a lot for trying.
>
> PS. Probably I should switch to Mac :)

I just ran it again (took a couple of seconds on a 2009 unibody  
MacBook Pro (Core 2 Duo) w/ 8 GB):
 > RMSEP
[[1]]
[1] 0.1720245

[[2]]
[1] 0.3396405

[[3]]
[1] 0.2359737

[[4]]
[1] 0.203541

[[5]]
[1] 0.1965804

[[6]]
[1] 0.1662158

[[7]]
[1] 0.1705594

[[8]]
[1] 0.2553175

[[9]]
[1] 0.1748892

[[10]]
[1] 0.09500263

>
>
> David Winsemius wrote:
>>
>> I cannot really be sure what you are trying to do,  but doing a bit  
>> of
>> "surgery" on your code lets it run on a multicore Mac:
>>
>> library(e1071)
>> library(snow)
>> library(pls)
>>
>> data(gasoline)
>>
>> X=gasoline$NIR
>> Y=gasoline$octane
>>
>> NR=10
>> cost1=seq(0.5,30, length=NR)
>>
>> sv.lin<- function(c) {
>>
>> for (i in 1:NR) {
>>
>> ind=sample(1:60,50)
>> gTest<-  data.frame(Y=I(Y[-ind]),X=I(X[-ind,]))
>> gTrain<- data.frame(Y=I(Y[ind]),X=I(X[ind,]))
>>
>> svm.lin   	  <- svm(gTrain$X,gTrain$Y, kernel="linear",cost=c[i],
>> cross=5)
>> results.lin   <- predict(svm.lin, gTest$X)
>>
>> e.test.lin     <- sqrt(sum((results.lin-gTest$Y)^2)/length(gTest$Y))
>>
>> return(e.test.lin)
>> }
>> }
>>
>> cl<- makeCluster(2, type="SOCK" )
>>
>> clusterEvalQ(cl, library(e1071))
>> cost1=seq(0.5,30, length=NR)
>>
>> clusterExport(cl,c("NR","Y","X",  "cost1"))
>> # Pretty sure you need a copy of cost1 on each node.
>>
>>
>> RMSEP<-clusterApply(cl, cost1, sv.lin)
>> # I thought the second argument was the matrix or vector over which  
>> to
>> iterate.
>>
>> stopCluster(cl)
>>
>> # Since I don't know what the model meant, I cannot determine whehter
>> this result is interpretable>
>>> RMSEP
>> [[1]]
>> [1] 0.1921887
>>
>> [[2]]
>> [1] 0.1924917
>>
>> [[3]]
>> [1] 0.1885066
>>
>> [[4]]
>> [1] 0.1871466
>>
>> [[5]]
>> [1] 0.3550932
>>
>> [[6]]
>> [1] 0.1226460
>>
>> [[7]]
>> [1] 0.2426345
>>
>> [[8]]
>> [1] 0.2126299
>>
>> [[9]]
>> [1] 0.2276286
>>
>> [[10]]
>> [1] 0.2064534
>>
>> -- 
>> David Winsemius, MD
>>
>> On Nov 18, 2009, at 7:09 AM, raluca wrote:
>>
>>>
>>> Hi Charlie,
>>>
>>>
>>> Yes, you are perfectly right, when I make the clusters I should put
>>> 2, not
>>> 10 (it remained 10 from previous trials with 10 slaves).
>>>
>>> cl<- makeCluster(2, type="SOCK" )
>>>
>>> To tell the truth I do not understand very well what the 2nd
>>> parameter for
>>> clusterApplyLB() has to be.
>>>
>>> If the function sv.lin has just 1 parameter, sv.lin(c), where c is
>>> the cost,
>>> how should I call clusterApplyLB?
>>>
>>>
>>> ? clusterApply LB(cl, ?,sv.lin, c=cost1)  ?
>>>
>>>
>>>
>>> Below, I am providing a working example, using the gasoline data
>>> that comes
>>> in the pls package.
>>>
>>> Thank you for your time!
>>>
>>>
>>> library(e1071)
>>> library(snow)
>>> library(pls)
>>>
>>> data(gasoline)
>>>
>>> X=gasoline$NIR
>>> Y=gasoline$octane
>>>
>>> NR=10
>>> cost1=seq(0.5,30, length=NR)
>>>
>>>
>>> sv.lin<- function(c) {
>>>
>>> for (i in 1:NR) {
>>>
>>> ind=sample(1:60,50)
>>> gTest<-  data.frame(Y=I(Y[-ind]),X=I(X[-ind,]))
>>> gTrain<- data.frame(Y=I(Y[ind]),X=I(X[ind,]))
>>>
>>> svm.lin   	  <- svm(gTrain$X,gTrain$Y, kernel="linear",cost=c[i],
>>> cross=5)
>>> results.lin   <- predict(svm.lin, gTest$X)
>>>
>>> e.test.lin     <- sqrt(sum((results.lin-gTest$Y)^2)/length(gTest$Y))
>>>
>>> return(e.test.lin)
>>> }
>>> }
>>>
>>>
>>> cl<- makeCluster(2, type="SOCK" )
>>>
>>>
>>> clusterEvalQ(cl,library(e1071))
>>>
>>>
>>> clusterExport(cl,c("NR","Y","X"))
>>>
>>>
>>> RMSEP<-clusterApplyLB(cl,?,sv.lin,c=cost1)
>>>
>>> stopCluster(cl)
>>>
>>>
>>>
>>>
>>>
>>> cls59 wrote:
>>>>
>>>>
>>>> raluca wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> Is the first time I am using SNOW package and I am trying to tune
>>>>> the
>>>>> cost parameter for a linear SVM, where the cost (variable cost1)
>>>>> takes 10
>>>>> values between 0.5 and 30.
>>>>>
>>>>> I have a large dataset and a pc which is not very powerful, so I
>>>>> need to
>>>>> tune the parameters using both CPUs of the pc.
>>>>>
>>>>> Somehow I cannot manage to do it. It seems that both CPUs are
>>>>> fitting the
>>>>> model for the same values of cost1, I guess the first 5, but not
>>>>> for the
>>>>> last 5.
>>>>>
>>>>> Please, can anyone help me!
>>>>>
>>>>> Here is the code:
>>>>>
>>>>> data <- data.frame(Y=I(Y),X=I(X))
>>>>> data.X<-data$X
>>>>> data.Y<-data$Y
>>>>>
>>>>>
>>>>
>>>>
>>>> Helping you will be difficult as we're only three lines into your
>>>> example
>>>> and already I have no idea what the data you are using looks like.
>>>> Example code needs to be fully reproducible-- that means a small
>>>> slice of
>>>> representative data needs to be provided or faked using an
>>>> appropriate
>>>> random number generator.
>>>>
>>>> Some things did jump out at me about your approach and I've made  
>>>> some
>>>> notes below.
>>>>
>>>>
>>>>
>>>> raluca wrote:
>>>>>
>>>>> NR=10
>>>>> cost1=seq(0.5,30, length=NR)
>>>>>
>>>>> sv.lin<- function(cl,c) {
>>>>>
>>>>> for (i in 1:NR) {
>>>>>
>>>>> ind=sample(1:414,276)
>>>>>
>>>>> hogTest<-  data.frame(Y=I(data.Y[-ind]),X=I(data.X[-ind,]))
>>>>> hogTrain<- data.frame(Y=I(data.Y[ind]),X=I(data.X[ind,]))
>>>>>
>>>>> svm.lin   	  <- svm(hogTrain$X,hogTrain$Y,
>>>>> kernel="linear",cost=c[i],
>>>>> cross=5)
>>>>> results.lin   <- predict(svm.lin, hogTest$X)
>>>>>
>>>>> e.test.lin     <- sqrt(sum((results.lin-hogTest$Y)^2)/
>>>>> length(hogTest$Y))
>>>>>
>>>>> return(e.test.lin)
>>>>> }
>>>>> }
>>>>>
>>>>> cl<- makeCluster(10, type="SOCK" )
>>>>>
>>>>
>>>>
>>>> If your machine has two cores, why are you setting up a cluster
>>>> with 10
>>>> nodes?  Usually the number of nodes should equal the number of
>>>> cores on
>>>> your machine in order to keep things efficient.
>>>>
>>>>
>>>>
>>>> raluca wrote:
>>>>>
>>>>>
>>>>> clusterEvalQ(cl,library(e1071))
>>>>>
>>>>> clusterExport(cl,c("data.X","data.Y","NR","cost1"))
>>>>>
>>>>> RMSEP<-clusterApplyLB(cl,cost1,sv.lin)
>>>>>
>>>>
>>>>
>>>> Are you sure this evaluation even produces results? sv.lin() is a
>>>> function
>>>> you defined above that takes two parameters-- "cl" and "c".
>>>> clusterApplyLB() will feed values of cost1 into sv.lin() for the
>>>> argument
>>>> "cl", but it has nothing to give for "c".  At the very least, it
>>>> seems
>>>> like you would need something like:
>>>>
>>>> RMSEP <- clusterApplyLB( cl, cost1, sv.lin, c = someVector )
>>>>
>>>>
>>>>
>>>> raluca wrote:
>>>>>
>>>>>
>>>>> stopCluster(cl)
>>>>>
>>>>>
>>>>
>>>>
>>>> Sorry I can't be very helpful, but with no data and no apparent way
>>>> to
>>>> legally call sv.lin() the way you have it set up, I can't
>>>> investigate the
>>>> problem to see if I get the same results you described.  If you  
>>>> could
>>>> provide a complete working example, then there's a better chance  
>>>> that
>>>> someone on this list will be able to help you.
>>>>
>>>> Good luck!
>>>>
>>>> -Charlie
>>>>
>>>
>>> -- 
>>> View this message in context:
>>> http://old.nabble.com/SVM-Param-Tuning-with-using-SNOW-package-tp26399401p26406709.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> -- 
> View this message in context: http://old.nabble.com/SVM-Param-Tuning-with-using-SNOW-package-tp26399401p26415997.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.