[R] ncol() vs. length() on data.frames
c@|@ndr@ @end|ng |rom rgzm@de
Mon Apr 6 08:48:23 CEST 2020
Thank you Greg for the insights!
I agree with you that the decrease in speed is not worth the decrease in
readability, and I'll change my length() calls to ncol().
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
56567 Neuwied, Germany
+49 (0) 2631 9772-243
On 03/04/2020 17:45, Greg Snow wrote:
> As others have pointed out, ncol calls the length function, so you are
> pretty safe in terms of output of getting the same result when applied
> to the results of functions like read.csv (there will be a big
> difference if you ever apply those functions to a matrix or some other
> data structures).
> One thing that I have not seen yet is a comparison on timing, so here goes:
> + length = length(iris),
> + ncol = ncol(iris)
> + )
> Unit: nanoseconds
> expr min lq mean median uq max neval
> length 700 750 869 800 800 7400 100
> ncol 2400 2500 2981 2600 2700 31900 100
> So ncol takes about 3 times as long to run as length on the iris data
> frame (5 columns), you can rerun the above code with data frames more
> the size that you will be using to see if that makes any difference.
> But also notice that the units are nanoseconds, so the median time for
> ncol to run is less than the time it takes light to travel a kilometer
> in a vacuum, or about the time it takes light to go 1/3 of a mile
> through a fiber optic cable (en.wikipedia.org/wiki/Microsecond). If
> this is used as part of a simulation or other repeated procedure and
> it is done one million times then you will add about 2 seconds to the
> overall run. If this is just part of code where length/ncol will be
> called fewer than 10 times then nobody is going to notice.
> So the trade-off of moving from length to ncol is a slight decrease in
> speed for an increase of readability. I think that I would go with
> the readability myself.
> On Tue, Mar 31, 2020 at 8:11 AM Ivan Calandra <calandra using rgzm.de> wrote:
>> Thanks Ivan for the answer.
>> So it confirms my first thought that these two functions are equivalent
>> when applied to a "simple" data.frame.
>> The reason I was asking is because I have gotten used to use length() in
>> my scripts. It works perfectly and I understand it easily. But to be
>> honest, ncol() is more intuitive to most users (especially the novice)
>> so I was thinking about switching to using this function instead (all my
>> data.frames are created from read.csv() or similar functions so there
>> should not be any issue). But before doing that, I want to be sure that
>> it is not going to create unexpected results.
>> Thank you,
>> Dr. Ivan Calandra
>> TraCEr, laboratory for Traceology and Controlled Experiments
>> MONREPOS Archaeological Research Centre and
>> Museum for Human Behavioural Evolution
>> Schloss Monrepos
>> 56567 Neuwied, Germany
>> +49 (0) 2631 9772-243
>> On 31/03/2020 16:00, Ivan Krylov wrote:
>>> On Tue, 31 Mar 2020 14:47:54 +0200
>>> Ivan Calandra <calandra using rgzm.de> wrote:
>>>> On a simple data.frame (i.e. each element is a vector), ncol() and
>>>> length() will give the same result.
>>>> Are they just equivalent on such objects, or are they differences in
>>>> some cases?
>>> I am not aware of any exceptions to ncol(dataframe)==length(dataframe)
>>> (in fact, ncol(x) is dim(x)[2L] and ?dim says that dim(dataframe)
>>> returns c(length(attr(dataframe, 'row.names')), length(dataframe))), but
>>> watch out for AsIs columns which can have columns of their own:
>>> x <- data.frame(I(volcano))
>>> #  87 1
>>> #  1
>>> #  87 61
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help