[Rd] Most efficient way to check the length of a variable mentioned in a formula.

Duncan Murdoch murdoch.duncan at gmail.com
Tue Oct 21 10:17:14 CEST 2014


On 17/10/2014, 2:23 PM, Gabriel Becker wrote:
> Joris,
> 
> For me
> 
> length(environment(form)[["x"]])
> 
> Was about twice as fast as
> 
> length(get("x",environment(form))))
> 
> In the year-old version of R (3.0.2) that I have on the virtual machine i'm
> currently using.

Those are different:  get() will look in parent environments, but
indexing an environment won't.

For the original question:  you really have no guarantee that the
length() function will do what you want if you evaluate it in an
environment set by the user, so the approach with get is more robust.

Duncan Murdoch

> 
> As for you, the eval method was much slower (though my factor was much
> larger than 20)
> 
>> system.time({thing <- replicate(10000,length(environment(form)[["x"]]))})
>    user  system elapsed
>   0.018   0.000   0.018
>> system.time({thing <-
> replicate(10000,length(get("x",environment(form))))})   user  system
> elapsed
>   0.031   0.000   0.033
>> system.time({thing <- replicate(10000,eval(parse(text = "length(x)"),
> envir=environment(form)))})
>    user  system elapsed
>   4.528   0.003   4.656
> 
> I can't speak this second to whether this pattern will hold in the more
> modern versions of R I typically use.
> 
> ~G
> 
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-pc-linux-gnu (64-bit)
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> 
> 
> 
> 
> 
> On Fri, Oct 17, 2014 at 11:04 AM, Joris Meys <jorismeys at gmail.com> wrote:
> 
>> Dear R gurus,
>>
>> I need to know the length of a variable (let's call that X) that is
>> mentioned in a formula. So obviously I look for the environment from which
>> the formula is called and then I have two options:
>>
>> - using eval(parse(text='length(X)'),
>>                     envir=environment(formula) )
>>
>> - using length(get('X'),
>>             envir=environment(formula) )
>>
>> a bit of benchmarking showed that the first option is about 20 times
>> slower, to that extent that if I repeat it 10,000 times I save more than
>> half a second. So speed is not really an issue here.
>>
>> Personally I'd go for option 2 as that one is easier to read and does the
>> job nicely, but with these functions I'm always a bit afraid that I'm
>> overseeing important details or side effects here (possibly memory issues
>> when working with larger data).
>>
>> Anybody an idea what the dangers are of these methods, and which one is the
>> most robust method?
>>
>> Thank you
>> Joris
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Mathematical Modelling, Statistics and Bio-Informatics
>>
>> tel : +32 9 264 59 87
>> Joris.Meys at Ugent.be
>> -------------------------------
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 
> 
>



More information about the R-devel mailing list