[R] SVM Param Tuning with using SNOW package

David Winsemius dwinsemius at comcast.net
Wed Nov 18 15:44:09 CET 2009


I cannot really be sure what you are trying to do,  but doing a bit of  
"surgery" on your code lets it run on a multicore Mac:

library(e1071)
library(snow)
library(pls)

data(gasoline)

X=gasoline$NIR
Y=gasoline$octane

NR=10
cost1=seq(0.5,30, length=NR)

sv.lin<- function(c) {

for (i in 1:NR) {

ind=sample(1:60,50)
gTest<-  data.frame(Y=I(Y[-ind]),X=I(X[-ind,]))
gTrain<- data.frame(Y=I(Y[ind]),X=I(X[ind,]))

svm.lin   	  <- svm(gTrain$X,gTrain$Y, kernel="linear",cost=c[i],  
cross=5)
results.lin   <- predict(svm.lin, gTest$X)

e.test.lin     <- sqrt(sum((results.lin-gTest$Y)^2)/length(gTest$Y))

return(e.test.lin)
}
}

cl<- makeCluster(2, type="SOCK" )

clusterEvalQ(cl, library(e1071))
cost1=seq(0.5,30, length=NR)

clusterExport(cl,c("NR","Y","X",  "cost1"))
# Pretty sure you need a copy of cost1 on each node.


RMSEP<-clusterApply(cl, cost1, sv.lin)
# I thought the second argument was the matrix or vector over which to  
iterate.

stopCluster(cl)

# Since I don't know what the model meant, I cannot determine whehter  
this result is interpretable>
 > RMSEP
[[1]]
[1] 0.1921887

[[2]]
[1] 0.1924917

[[3]]
[1] 0.1885066

[[4]]
[1] 0.1871466

[[5]]
[1] 0.3550932

[[6]]
[1] 0.1226460

[[7]]
[1] 0.2426345

[[8]]
[1] 0.2126299

[[9]]
[1] 0.2276286

[[10]]
[1] 0.2064534

-- 
David Winsemius, MD

On Nov 18, 2009, at 7:09 AM, raluca wrote:

>
> Hi Charlie,
>
>
> Yes, you are perfectly right, when I make the clusters I should put  
> 2, not
> 10 (it remained 10 from previous trials with 10 slaves).
>
> cl<- makeCluster(2, type="SOCK" )
>
> To tell the truth I do not understand very well what the 2nd  
> parameter for
> clusterApplyLB() has to be.
>
> If the function sv.lin has just 1 parameter, sv.lin(c), where c is  
> the cost,
> how should I call clusterApplyLB?
>
>
> ? clusterApply LB(cl, ?,sv.lin, c=cost1)  ?
>
>
>
> Below, I am providing a working example, using the gasoline data  
> that comes
> in the pls package.
>
> Thank you for your time!
>
>
> library(e1071)
> library(snow)
> library(pls)
>
> data(gasoline)
>
> X=gasoline$NIR
> Y=gasoline$octane
>
> NR=10
> cost1=seq(0.5,30, length=NR)
>
>
> sv.lin<- function(c) {
>
> for (i in 1:NR) {
>
> ind=sample(1:60,50)
> gTest<-  data.frame(Y=I(Y[-ind]),X=I(X[-ind,]))
> gTrain<- data.frame(Y=I(Y[ind]),X=I(X[ind,]))
>
> svm.lin   	  <- svm(gTrain$X,gTrain$Y, kernel="linear",cost=c[i],  
> cross=5)
> results.lin   <- predict(svm.lin, gTest$X)
>
> e.test.lin     <- sqrt(sum((results.lin-gTest$Y)^2)/length(gTest$Y))
>
> return(e.test.lin)
> }
> }
>
>
> cl<- makeCluster(2, type="SOCK" )
>
>
> clusterEvalQ(cl,library(e1071))
>
>
> clusterExport(cl,c("NR","Y","X"))
>
>
> RMSEP<-clusterApplyLB(cl,?,sv.lin,c=cost1)
>
> stopCluster(cl)
>
>
>
>
>
> cls59 wrote:
>>
>>
>> raluca wrote:
>>>
>>> Hello,
>>>
>>> Is the first time I am using SNOW package and I am trying to tune  
>>> the
>>> cost parameter for a linear SVM, where the cost (variable cost1)  
>>> takes 10
>>> values between 0.5 and 30.
>>>
>>> I have a large dataset and a pc which is not very powerful, so I  
>>> need to
>>> tune the parameters using both CPUs of the pc.
>>>
>>> Somehow I cannot manage to do it. It seems that both CPUs are  
>>> fitting the
>>> model for the same values of cost1, I guess the first 5, but not  
>>> for the
>>> last 5.
>>>
>>> Please, can anyone help me!
>>>
>>> Here is the code:
>>>
>>> data <- data.frame(Y=I(Y),X=I(X))
>>> data.X<-data$X
>>> data.Y<-data$Y
>>>
>>>
>>
>>
>> Helping you will be difficult as we're only three lines into your  
>> example
>> and already I have no idea what the data you are using looks like.
>> Example code needs to be fully reproducible-- that means a small  
>> slice of
>> representative data needs to be provided or faked using an  
>> appropriate
>> random number generator.
>>
>> Some things did jump out at me about your approach and I've made some
>> notes below.
>>
>>
>>
>> raluca wrote:
>>>
>>> NR=10
>>> cost1=seq(0.5,30, length=NR)
>>>
>>> sv.lin<- function(cl,c) {
>>>
>>> for (i in 1:NR) {
>>>
>>> ind=sample(1:414,276)
>>>
>>> hogTest<-  data.frame(Y=I(data.Y[-ind]),X=I(data.X[-ind,]))
>>> hogTrain<- data.frame(Y=I(data.Y[ind]),X=I(data.X[ind,]))
>>>
>>> svm.lin   	  <- svm(hogTrain$X,hogTrain$Y,  
>>> kernel="linear",cost=c[i],
>>> cross=5)
>>> results.lin   <- predict(svm.lin, hogTest$X)
>>>
>>> e.test.lin     <- sqrt(sum((results.lin-hogTest$Y)^2)/ 
>>> length(hogTest$Y))
>>>
>>> return(e.test.lin)
>>> }
>>> }
>>>
>>> cl<- makeCluster(10, type="SOCK" )
>>>
>>
>>
>> If your machine has two cores, why are you setting up a cluster  
>> with 10
>> nodes?  Usually the number of nodes should equal the number of  
>> cores on
>> your machine in order to keep things efficient.
>>
>>
>>
>> raluca wrote:
>>>
>>>
>>> clusterEvalQ(cl,library(e1071))
>>>
>>> clusterExport(cl,c("data.X","data.Y","NR","cost1"))
>>>
>>> RMSEP<-clusterApplyLB(cl,cost1,sv.lin)
>>>
>>
>>
>> Are you sure this evaluation even produces results? sv.lin() is a  
>> function
>> you defined above that takes two parameters-- "cl" and "c".
>> clusterApplyLB() will feed values of cost1 into sv.lin() for the  
>> argument
>> "cl", but it has nothing to give for "c".  At the very least, it  
>> seems
>> like you would need something like:
>>
>>  RMSEP <- clusterApplyLB( cl, cost1, sv.lin, c = someVector )
>>
>>
>>
>> raluca wrote:
>>>
>>>
>>> stopCluster(cl)
>>>
>>>
>>
>>
>> Sorry I can't be very helpful, but with no data and no apparent way  
>> to
>> legally call sv.lin() the way you have it set up, I can't  
>> investigate the
>> problem to see if I get the same results you described.  If you could
>> provide a complete working example, then there's a better chance that
>> someone on this list will be able to help you.
>>
>> Good luck!
>>
>> -Charlie
>>
>
> -- 
> View this message in context: http://old.nabble.com/SVM-Param-Tuning-with-using-SNOW-package-tp26399401p26406709.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list