[R] Problem to remove loops in a routine

Thu Aug 2 00:57:07 CEST 2007

First thing to do is to use Rprof to determine where the time is being
spent and then you can pinpoint what section of code is taking the
time.  A quick look says to do all your subsetting at once.  You might
look into using 'split' to create the subsets and then access the
subsets with "[[...]]".  So use Rprof and see if most of the time is
being spent the the data.frame subsetting functions.

On 8/1/07, Sébastien <pomchip at free.fr> wrote:
> Dear R-users,
>
> I have written the following code to generate some trellis plots. It
> works perfectly fine except that it is quite slow when it is apply to my
> typical datasets (over several thousands of lines). I believe the
> problem comes from the loops I am using to subset my data.frame. I read
> in the archives that the tapply function is often more efficient than a
> loop in R. Unfortunately , it seems that I am not enough familiar with
> the "philosophy" of this function to implement it in my code. Would you
> have some suggestions to speed up the whole thing?
>
> Thanks in advance
>
> Sebastien
>
> PS: the rationale behind these loops is to split the trellis plots on
> different pages, all plots on a page (or a group of pages) having a
> given combination of values for the PLOT, DVID, PER and GRP parameters.
>
> ###############################################
>
> library(lattice)
> rm(list=ls(all=TRUE))
>
> # Generate a dummy dataset with
> # - 20 individuals (ID)
> # - individuals 1 to 10 belong to group (GRP) 1, 11 to 20 belong to group 2
> # - measurements (DV) done at 10 time points (TIME) per individuals on 2
> occassions (OCC)
> # - modelisation of the DV versus TIME relationships with 4 different
> models (MODEL)
> # - predicted values (Y)
> # - the PLOT column serves as a flag to plot together the models (A and
> B) and (C and D)
>
> PLOT<-rep(1:2,each=40,times=20)
> ID<-rep(1:20,each=80)
> OCC<-rep(1:2,each=10,times=80)
> GRP<-as.numeric(rep(gl(2,80),times=10))
> MODEL<-as.vector(rep(gl(4,20,label=c("A","B","C","D")),times=20))
> TIME<-rep.int(1:10,160)
> DV<-OCC*(1:10)*rep(rnorm(20,50,10),each=80)+rep(rnorm(20,10,1),each=80)
> Y<-jitter(DV)
>
> mydata<-data.frame(PLOT,ID,OCC,GRP,MODEL,TIME,DV,Y)
>
> mydata$DVID<-rep.int(1,1600)   #in a real dataset, DVID could have
> typically 2 to 3 levels
>
> #############
> # Plotting routine
> #############
>
> myPath<-"C:/"                    #TO BE MODIFIED
>
> nTrellisCol<-2                                    #number of columns per
> Trellis plot
> nTrellisRow<-3                                    #number of lines per
> Trellis plot
>
> nDVID<-nlevels(factor(mydata$DVID))                #number of
> DVID=observations types
> nidPlot<-nlevels(factor(mydata$PLOT))            #number of items in the
> PLOT column
> nPer<-nlevels(factor(mydata$OCC))                  #number of occassions
> (OCC, PER, etc...)
> nGRP<-nlevels(factor(mydata$GRP))         #number of groups
>
> pdf(file=paste(myPath,"test.pdf",sep=""))
>
> trellis.par.set(par.main.text=list(cex=1))
> trellis.par.set(par.ylab.text=list(font=2))
> trellis.par.set(par.xlab.text=list(font=2))
>
> for (i in 1:nidPlot) {
> #loop on PLOT id
> #    i=1
>    idata<-subset(mydata,mydata$PLOT==i)
>
> for (j in 1:nDVID) {
> #loop on DVID
> #    j=1
>    ijdata<-subset(idata,idata$DVID==j)
>
>        for (k in 1:nPer) {
> #loop on Period
> #    k=1
>
>    ijkdata<-subset(ijdata,ijdata$OCC==k)
>
>         for (l in 1:nGRP) {                           #loop on Group
> # l=1
>
>  subdata<-subset(ijkdata,ijkdata$GRP==l)
>
>  nModel<-nlevels(factor(subdata$MODEL))
> #number of models to be plotted in this loop
>    mylegend<-c("Raw data",levels(factor(subdata$MODEL)))
>    subID<-nlevels(factor(subdata$ID))                       #number of
> ID in the new dataset
>
>  myplot<-xyplot(Y ~ TIME | ID,                               #creates plot
>            data = subdata, type = "l",
>            groups = MODEL,
>            observed = subdata$DV,
>            as.table=TRUE,
>            panel = function(x, y, ..., subscripts, observed) {
>                panel.points(x, pch=3,col=1,observed[subscripts])
>                panel.xyplot(x, y, ...,
> col=2:nlevels(subdata$MODEL),subscripts = subscripts)},
>
>            strip=function (which.panel,...){
>                col<-rep("Black",subID)
>                llines(c(0,1,1,0,0),c(0,0,1,1,0),col.line=1)
>                ltext(rep(0.5,subID),rep(0.5,subID),
>                paste("Subject
> ",levels(factor(subdata$ID))[which.panel],sep=""),cex=trellis.par.get("axis.text")[2])},
>
>            key=list(space="bottom",
>               lines = list(pch = as.integer(c(3,rep("",nModel))), type
> = c("p", gl(1,nModel,label="l")),
>                             col =
> 1:(nModel+1),"cex"=trellis.par.get("axis.text")[2]),
>                     text=list(mylegend,
> "cex"=trellis.par.get("axis.text")[2])),
>
>            xlab="Time (hr)",
>            ylab="Concentration (ng/mL)",
>            layout=c(nTrellisCol,nTrellisRow),
>
>            main=paste(paste(paste("Plot ",i,sep=""),
>                             paste(paste(", DVID ",j,sep=""),
>                                   paste(paste(", Occasion ",k,sep=""),
>                                         paste(", Group
> ",l,sep="")))),sep=""))
>
>
>  trellis.par.set(par.xlab.text=list(cex=trellis.par.get("axis.text")[2]))
>    trellis.par.set(par.ylab.text=list(cex=trellis.par.get("axis.text")[2]))
>
>
> print(myplot,panel.width=list(x=(0.75/nTrellisCol),units="npc"),panel.height=list(x=(0.50/nTrellisRow),units="npc"))
>
>        }}}}
> dev.off()
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?