[R] speed up process

jim holtman jholtman at gmail.com
Fri Feb 25 14:29:15 CET 2011


You invoke Rprof, run your code and then terminate it:


Rprof()
....... code you want to profile
Rprof(NULL)  # generate output
summaryRprof()

example:


> Rprof()
> for (i in 1:1e6) sin(i) + cos(i) + sqrt(i)
> Rprof(NULL)
> summaryRprof()
$by.self
     self.time self.pct total.time total.pct
sin       0.24    30.77       0.24     30.77
sqrt      0.22    28.21       0.22     28.21
cos       0.16    20.51       0.16     20.51
+         0.14    17.95       0.14     17.95
:         0.02     2.56       0.02      2.56

$by.total
     total.time total.pct self.time self.pct
sin        0.24     30.77      0.24    30.77
sqrt       0.22     28.21      0.22    28.21
cos        0.16     20.51      0.16    20.51
+          0.14     17.95      0.14    17.95
:          0.02      2.56      0.02     2.56

$sample.interval
[1] 0.02

$sampling.time
[1] 0.78


On Fri, Feb 25, 2011 at 6:57 AM, Ivan Calandra
<ivan.calandra at uni-hamburg.de> wrote:
> Dear Jim,
>
> I've tried to use Rprof() as you advised me, but I don't understand how it
> works.
> I've done this:
> Rprof(for (i in seq_along(seq.yvar)){
>  all_my_commands
> })
> summaryRprof()
>
> But I got this error:
> Error in summaryRprof() : no lines found in ‘Rprof.out’
>
> I couldn't really understand from the help page what I should do.
>
> In any case, it's sure that the function tstsreg(), is what takes the most
> computing time. But I wanted to optimize the rest of the code to gain as
> much speed as possible.
>
> Ivan
>
> Le 2/25/2011 12:30, Jim Holtman a écrit :
>>
>> use Rprof to find where time is being spent.  probably in 'plot' which
>> might imply it is not the 'for' loop and therefore beyond your control.
>>
>> Sent from my iPad
>>
>> On Feb 25, 2011, at 6:19, Ivan Calandra<ivan.calandra at uni-hamburg.de>
>>  wrote:
>>
>>> Thanks Nick for your quick answer.
>>> It does work (no missed bracket!) but unfortunately doesn't really speed
>>> up anything: with my real data, it takes 82.78 seconds with the double
>>> lapply() instead of 83.59s with the double loop (about 0.8 s).
>>>
>>> It looks like my double loop was not that bad. Does anyone know another
>>> faster way to do this?
>>>
>>> Thanks again in advance,
>>> Ivan
>>>
>>> Le 2/25/2011 11:41, Nick Sabbe a écrit :
>>>>
>>>> Simply avoiding the for loops by using lapply (I may have missed a
>>>> bracket
>>>> here or there cause I did this without opening R)...
>>>> Haven't checked the speed up, though.
>>>>
>>>> lapply(seq.yvar, function(k){
>>>>    plot(mydata1[[k]]~mydata1[[ind.xvar]], type="p",
>>>> xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
>>>>    lapply(seq_along(mydata_list), function(j){
>>>>      foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
>>>> pos=mypos[j], name.dat=names(mydata_list)[j])
>>>>      return(NULL)
>>>>    })
>>>>    invisible(NULL)
>>>> })
>>>>
>>>> HTH,
>>>>
>>>> Nick Sabbe
>>>> --
>>>> ping: nick.sabbe at ugent.be
>>>> link: http://biomath.ugent.be
>>>> wink: A1.056, Coupure Links 653, 9000 Gent
>>>> ring: 09/264.59.36
>>>>
>>>> -- Do Not Disapprove
>>>>
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
>>>> On
>>>> Behalf Of Ivan Calandra
>>>> Sent: vrijdag 25 februari 2011 11:20
>>>> To: r-help
>>>> Subject: [R] speed up process
>>>>
>>>> Dear users,
>>>>
>>>> I have a double for loop that does exactly what I want, but is quite
>>>> slow. It is not so much with this simplified example, but IRL it is
>>>> slow.
>>>> Can anyone help me improve it?
>>>>
>>>> The data and code for foo_reg() are available at the end of the email; I
>>>> preferred going directly into the problematic part.
>>>> Here is the code (I tried to simplify it but I cannot do it too much or
>>>> else it wouldn't represent my problem). It might also look too complex
>>>> for what it is intended to do, but my colleagues who are also supposed
>>>> to use it don't know much about R. So I wrote it so that they don't have
>>>> to modify the critical parts to run the script for their needs.
>>>>
>>>> #column indexes for function
>>>> ind.xvar<- 2
>>>> seq.yvar<- 3:4
>>>> #position vector for legend(), stupid positioning but it doesn't matter
>>>> here
>>>> mypos<- c("topleft", "topright","bottomleft")
>>>>
>>>> #run the function for columns 3&4 as y (seq.yvar) with column 2 as x
>>>> (ind.xvar) for all 3 datasets (mydata_list)
>>>> par(mfrow=c(2,1))
>>>> for (i in seq_along(seq.yvar)){
>>>>    k<- seq.yvar[i]
>>>>    plot(mydata1[[k]]~mydata1[[ind.xvar]], type="p",
>>>> xlab=names(mydata1)[ind.xvar], ylab=names(mydata1)[k])
>>>>    for (j in seq_along(mydata_list)){
>>>>      foo_reg(dat=mydata_list[[j]], xvar=ind.xvar, yvar=k, mycol=j,
>>>> pos=mypos[j], name.dat=names(mydata_list)[j])
>>>>    }
>>>> }
>>>>
>>>> I tried with lapply() or mapply() but couldn't manage to pass the
>>>> arguments for names() and col= correctly, e.g. for the 2nd loop:
>>>> lapply(mydata_list, FUN=function(x){foo_reg(dat=x, xvar=ind.xvar,
>>>> yvar=k, col1=1:3, pos=mypos[1:3], name.dat=names(x)[1:3])})
>>>> mapply(FUN=function(x) {foo_reg(dat=x, name.dat=names(x)[1:3])},
>>>> mydata_list, col1=1:3, pos=mypos, MoreArgs=list(xvar=ind.xvar, yvar=k))
>>>>
>>>> Thanks in advance for any hints.
>>>> Ivan
>>>>
>>>>
>>>>
>>>>
>>>> #create data (it looks horrible with these datasets but it doesn't
>>>> matter here)
>>>> mydata1<- structure(list(species = structure(1:8, .Label = c("alsen",
>>>> "gogor", "loalb", "mafas", "pacyn", "patro", "poabe", "thgel"), class =
>>>> "factor"), fruit = c(0.52, 0.45, 0.43, 0.82, 0.35, 0.9, 0.68, 0), Asfc =
>>>> c(207.463765, 138.5533755, 70.4391735, 160.9742745, 41.455809,
>>>> 119.155109, 26.241441, 148.337377), Tfv = c(47068.1437773483,
>>>> 43743.8087431582, 40323.5209129239, 23420.9455581495, 29382.6947428651,
>>>> 50460.2202192311, 21810.1456510625, 41747.6053810881)), .Names =
>>>> c("species", "fruit", "Asfc", "Tfv"), row.names = c(NA, 8L), class =
>>>> "data.frame")
>>>>
>>>> mydata2<- mydata1[!(mydata1$species %in% c("thgel","alsen")),]
>>>> mydata3<- mydata1[!(mydata1$species %in% c("thgel","alsen","poabe")),]
>>>> mydata_list<- list(mydata1=mydata1, mydata2=mydata2, mydata3=mydata3)
>>>>
>>>> #function for regression
>>>> library(WRS)
>>>> foo_reg<- function(dat, xvar, yvar, mycol, pos, name.dat){
>>>>   tsts<- tstsreg(dat[[xvar]], dat[[yvar]])
>>>>   tsts_inter<- signif(tsts$coef[1], digits=3)
>>>>   tsts_slope<- signif(tsts$coef[2], digits=3)
>>>>   abline(tsts$coef, lty=1, col=mycol)
>>>>   legend(x=pos, legend=c(paste("TSTS ",name.dat,":
>>>> Y=",tsts_inter,"+",tsts_slope,"X",sep="")), lty=1, col=mycol)
>>>> }
>>>>
>>> --
>>> Ivan CALANDRA
>>> PhD Student
>>> University of Hamburg
>>> Biozentrum Grindel und Zoologisches Museum
>>> Abt. Säugetiere
>>> Martin-Luther-King-Platz 3
>>> D-20146 Hamburg, GERMANY
>>> +49(0)40 42838 6231
>>> ivan.calandra at uni-hamburg.de
>>>
>>> **********
>>> http://www.for771.uni-bonn.de
>>> http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Ivan CALANDRA
> PhD Student
> University of Hamburg
> Biozentrum Grindel und Zoologisches Museum
> Abt. Säugetiere
> Martin-Luther-King-Platz 3
> D-20146 Hamburg, GERMANY
> +49(0)40 42838 6231
> ivan.calandra at uni-hamburg.de
>
> **********
> http://www.for771.uni-bonn.de
> http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
>
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list