[R] why is nrow() so slow?

David Winsemius dwinsemius at comcast.net
Tue Sep 15 17:45:03 CEST 2009


On Sep 15, 2009, at 10:48 AM, ivo welch wrote:

> dear R wizards:  here is the strange question for the day.  It seems  
> to me
> that nrow() is very slow.  Let me explain what I mean:
>
> ds= data.frame( NA, x=rnorm(10000) )   ##  a sample data set
>
>> system.time( { for (i in 1:10000) NA } )   ## doing nothing takes
> virtually no time
>   user  system elapsed
>  0.000   0.000   0.001
>
> ## this is something that should take time; we need to add 10,000  
> values
> 10,000 times
>> system.time( { for (i in 1:10000) mean(ds$x) } )
>   user  system elapsed
>  0.416   0.001   0.416
>
> ## alas, this should be very fast.  it is just reading off an  
> attribute of
> ds.  it takes almost a quarter of the time of mean()!
>> system.time( { for (i in 1:10000) nrow(ds) } )
>   user  system elapsed
>  0.124   0.001   0.125

I am guessing that you are coming from a statistical paradigm where  
there is an
implicit looping construct in a data step. In R you find the number of  
rows not
with a loop, but with the nrow function used just once.

 > ds= data.frame( NA, x=rnorm(10000) )
 > system.time(nrow(ds))
    user  system elapsed
       0       0       0


>
> ## here is an alternative way to implement nrows, which is already  
> much
> faster:
>> system.time( { for (i in 1:10000) length(ds$x) } )
>   user  system elapsed
>  0.041   0.000   0.041
>
> is there a faster way to learn how big a data frame is?

 > length(ds)
[1] 2
 > nrow(ds)
[1] 10000

# Or:

 > dim(ds)
[1] 10000     2

> I know this sounds
> silly, but this is inside a "by" statement, where I figure out how  
> many
> observations are in each subset.  strangely, this takes a whole lot of
> time.  I don't believe it is possible to ask "by" to attach an  
> attribute to
> the data frame that stores the number of observations that it is  
> actually
> passing.
>
> pointers appreciated.
>
> regards,
>
> /iaw
> -- 
> Ivo Welch (ivo.welch at brown.edu, ivo.welch at gmail.com)
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list