[R] Resources for optimizing code

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Nov 5 19:04:55 CET 2004


On Fri, 5 Nov 2004, Janet Elise Rosenbaum wrote:

> 
> I want to eliminate certain observations in a large dataframe (21000x100).
> I have written code which does this using a binary vector (0=delete obs,
> 1=keep), but it uses for loops, and so it's slow and in the extreme it 
> causes R to hang for indefinite time periods.
> 
> I'm looking for one of two things:
> 1.  A document which discusses how to avoid for loops and situations in
> which it's impossible to avoid for loops.

`S Programming': see the FAQ.
But at the level of the example below, chapter 2 of MASS4 (FAQ again for 
details).

> or
> 
> 2.  A function which can do the above better than mine.  
> 
> My code is pasted below.
> 
> Thanks so much,
> 
> Janet 
> 
> # asst is a binary vector of length= nrow(DATAFRAME).  
> # 1= observations you want to keep.  0= observation to get rid of.

How about DATAFRAME[asst == 1, ] ?

I am not sure if asst has NAs in, but if it has you will get an error from 
                if (asst[i]==1)
and if not, you don't need na.rm=T.

> DF <- as.data.frame(matrix(rnorm(21000*100),, 100))
> asst <- rbinom(21000, 1, 0.7)
> DF2 <- DF[asst==1,]

where the subsetting took less than a second for me.

Note that your code converts DATAFRAME to a matrix. If that is reasonable 
(e.g. it is all numeric), then matrix indexing will be faster.

> remove.xtra.f <-function(asst, DATAFRAME) {
> 	n<-sum(asst, na.rm=T)
> 	newdata<-matrix(nrow=n, ncol=ncol(DATAFRAME))
> 	j<-1
> 	for(i in 1:length(data)) {
> 		if (asst[i]==1) {
> 			newdata[j,]<-DATAFRAME[i,]
> 			j<-j+1
> 		}
> 	}
> 	newdata.f<-as.data.frame(newdata)
> 	names(newdata.f)<-names(DATAFRAME)
> 	return(newdata.f)
> }
> --  
> Janet Rosenbaum                                 jerosenb at fas.harvard.edu
> PhD Candidate in Health Policy, Harvard GSAS
> Harvard Injury Control Research Center, Harvard School of Public Health
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list