[R] ncol() vs. length() on data.frames

Greg Snow 538280 @end|ng |rom gm@||@com
Fri Apr 3 17:45:20 CEST 2020


As others have pointed out, ncol calls the length function, so you are
pretty safe in terms of output of getting the same result when applied
to the results of functions like read.csv (there will be a big
difference if you ever apply those functions to a matrix or some other
data structures).

One thing that I have not seen yet is a comparison on timing, so here goes:

> library(microbenchmark)
> microbenchmark(
+ length = length(iris),
+ ncol = ncol(iris)
+ )
Unit: nanoseconds
   expr  min   lq mean median   uq   max neval
 length  700  750  869    800  800  7400   100
   ncol 2400 2500 2981   2600 2700 31900   100

So ncol takes about 3 times as long to run as length on the iris data
frame (5 columns), you can rerun the above code with data frames more
the size that you will be using to see if that makes any difference.
But also notice that the units are nanoseconds, so the median time for
ncol to run is less than the time it takes light to travel a kilometer
in a vacuum, or about the time it takes light to go 1/3 of a mile
through a fiber optic cable (en.wikipedia.org/wiki/Microsecond).  If
this is used as part of a simulation or other repeated procedure and
it is done one million times then you will add about 2 seconds to the
overall run.  If this is just part of code where length/ncol will be
called fewer than 10 times then nobody is going to notice.

So the trade-off of moving from length to ncol is a slight decrease in
speed for an increase of readability.  I think that I would go with
the readability myself.

On Tue, Mar 31, 2020 at 8:11 AM Ivan Calandra <calandra using rgzm.de> wrote:
>
> Thanks Ivan for the answer.
>
> So it confirms my first thought that these two functions are equivalent
> when applied to a "simple" data.frame.
>
> The reason I was asking is because I have gotten used to use length() in
> my scripts. It works perfectly and I understand it easily. But to be
> honest, ncol() is more intuitive to most users (especially the novice)
> so I was thinking about switching to using this function instead (all my
> data.frames are created from read.csv() or similar functions so there
> should not be any issue). But before doing that, I want to be sure that
> it is not going to create unexpected results.
>
> Thank you,
> Ivan
>
> --
> Dr. Ivan Calandra
> TraCEr, laboratory for Traceology and Controlled Experiments
> MONREPOS Archaeological Research Centre and
> Museum for Human Behavioural Evolution
> Schloss Monrepos
> 56567 Neuwied, Germany
> +49 (0) 2631 9772-243
> https://www.researchgate.net/profile/Ivan_Calandra
>
> On 31/03/2020 16:00, Ivan Krylov wrote:
> > On Tue, 31 Mar 2020 14:47:54 +0200
> > Ivan Calandra <calandra using rgzm.de> wrote:
> >
> >> On a simple data.frame (i.e. each element is a vector), ncol() and
> >> length() will give the same result.
> >> Are they just equivalent on such objects, or are they differences in
> >> some cases?
> > I am not aware of any exceptions to ncol(dataframe)==length(dataframe)
> > (in fact, ncol(x) is dim(x)[2L] and ?dim says that dim(dataframe)
> > returns c(length(attr(dataframe, 'row.names')), length(dataframe))), but
> > watch out for AsIs columns which can have columns of their own:
> >
> > x <- data.frame(I(volcano))
> > dim(x)
> > # [1] 87  1
> > length(x)
> > # [1] 1
> > dim(x[,1])
> > # [1] 87 61
> >
> >
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538280 using gmail.com



More information about the R-help mailing list