[R] Can this code be written more efficiently?

jim holtman jholtman at gmail.com
Thu Sep 30 22:13:22 CEST 2010


Have you tried using Rprof to determine where time is being spent in
the current code?  Have you looked at how much memory you are using?
Are you paging?  Have you run with a size 'x', then '2x' then '4x' to
see what the growth in both CPU time and memory usage is?  This is
what I would do if I were trying to debug/optimize one of my scripts.
Before I would run something for a day, I would understand how the
processing time increases with the size of the input file so that I
would have an idea of how long to wait.

On Thu, Sep 30, 2010 at 1:40 PM, Guelman, Leo <leo.guelman at rbc.com> wrote:
> Dear users,
>
> I'm working on binary classification problem using Support Vector
> Machines (SVM). My objective is to train a series of SVM models on a
> grid of hyperparameters and then select those that maximize the AUC
> based on an independent validation sample.
>
> My attempted code is shown below. It runs well on "small" data sets but
> when I use it on a slightly larger sample (e.g., my train data is
> composed of about 8,000 observations on each class and 21 inputs), it
> takes "forever" to run (more than 1 day already and still running). I'm
> wondering if there's any way I can optimize this code. Thanks in advance
> for any help.
>
> I'm using 64-bit R 2.11.1 on Win 7.
>
> ####Start Code####
>
> library(e1071)
> library(ROCR)
>
> ### Create grid of hyperparameters
>
> Gseq <- seq(-15,3,2); G <- rep(2, length(Gseq)); G <- G^Gseq
> Cseq <- seq(-5,13,2); C <- rep(2, length(Cseq)); C <- C^Cseq
> mygrid <- expand.grid(C=C, G=G)
>
> ### Train models
>
> svm.models <-  lapply(1:nrow(mygrid), function(i) {
>                svm(churn.form, data = mytraindata,
>                method = "C-classification", kernel = "radial",
>                cost = mygrid[i,1], gamma = mygrid[i,2],
> probability=TRUE)
>                })
>
> ### Predict on test set
>
> pred.step3 <- numeric(length(svm.models))
>
> for (i in 1:length(svm.models)) {
>
> pred.step1 <- predict(svm.models[[i]], myvaliddata, decision.values = F,
>
>              probability=T)
>
> pred.step2 <-
> prediction(predictions=attr(pred.step1,"probabilities")[,1],
> labels=myvaliddata$churn)
>
> pred.step3[i] <- performance(pred.step2, "auc")@y.values[[1]]
>
> }
>
> pred.step3
>
> ####End Code####
>
>
> Thanks,
> Leo.
>
> _______________________________________________________________________
>
> This e-mail may be privileged and/or confidential, and the sender does not waive
> any related rights and obligations. Any distribution, use or copying of this e-mail or the information
> it contains by other than an intended recipient is unauthorized.
> If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately.
>
> Ce courriel peut contenir des renseignements protégés et confidentiels.
> L’expéditeur ne renonce pas aux droits et obligations qui s’y rapportent.
> Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu’il contient
> par une personne autre que le destinataire désigné est interdite.
> Si vous recevez ce courriel par erreur, veuillez m’en aviser immédiatement,
> par retour de courriel ou par un autre moyen.
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list