[R] Data frame operations getting slower when accessed by index

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Feb 22 00:16:13 CET 2007


What are D and M?   'Index' here could be a number or a name.
In either case, df[[D]] would be the equivalent of df$D.

However, your computation does not need a loop at all, let alone two.
Try something like

tmp <- with(df, paste(D, m))
dates <- unique(tmp)



On Wed, 21 Feb 2007, Alp ATICI wrote:

> I have a data frame called df which has about 100 columns but thousands of
> rows. I set D the index of df$D and M to be the index of df$M.
> When I run the following loop as it is vs. df[,D] and df[,M] replaced with
> df$D and df$M there is a real big time difference in completion (with the
> latter being significantly faster). Is there an easy reason why and how I
> could speed up access?
>
> for (m in 1:12) {
> for (d in 1:31) {
> filter1<-((df[,D]==d) & (df[,M]==m))
> if(sum(filter1)>0) {
> ctr<-ctr+1
> dates[ctr]<-paste(as.character(d),as.character(m))
> }
> }
> }
>
> Thanks...
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list