[Rd] Most efficient way to check the length of a variable mentioned in a formula.

Joris Meys jorismeys at gmail.com
Tue Oct 21 11:06:50 CEST 2014


Hi Duncan,

thanks for your reaction. I'm not following completely though what you mean
with "no guarantee that the length() function will do what I want if I
evaluate it in an environment set by the user". I wasn't intending to give
the user the opportunity to set those environments, but is there something
I'm overlooking there?

Cheers
Joris

On Tue, Oct 21, 2014 at 10:17 AM, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:

> On 17/10/2014, 2:23 PM, Gabriel Becker wrote:
> > Joris,
> >
> > For me
> >
> > length(environment(form)[["x"]])
> >
> > Was about twice as fast as
> >
> > length(get("x",environment(form))))
> >
> > In the year-old version of R (3.0.2) that I have on the virtual machine
> i'm
> > currently using.
>
> Those are different:  get() will look in parent environments, but
> indexing an environment won't.
>
> For the original question:  you really have no guarantee that the
> length() function will do what you want if you evaluate it in an
> environment set by the user, so the approach with get is more robust.
>
> Duncan Murdoch
>
> >
> > As for you, the eval method was much slower (though my factor was much
> > larger than 20)
> >
> >> system.time({thing <-
> replicate(10000,length(environment(form)[["x"]]))})
> >    user  system elapsed
> >   0.018   0.000   0.018
> >> system.time({thing <-
> > replicate(10000,length(get("x",environment(form))))})   user  system
> > elapsed
> >   0.031   0.000   0.033
> >> system.time({thing <- replicate(10000,eval(parse(text = "length(x)"),
> > envir=environment(form)))})
> >    user  system elapsed
> >   4.528   0.003   4.656
> >
> > I can't speak this second to whether this pattern will hold in the more
> > modern versions of R I typically use.
> >
> > ~G
> >
> >> sessionInfo()
> > R version 3.0.2 (2013-09-25)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> > locale:
> >  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> >  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> >  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> >  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> >  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> >
> >
> >
> >
> >
> > On Fri, Oct 17, 2014 at 11:04 AM, Joris Meys <jorismeys at gmail.com>
> wrote:
> >
> >> Dear R gurus,
> >>
> >> I need to know the length of a variable (let's call that X) that is
> >> mentioned in a formula. So obviously I look for the environment from
> which
> >> the formula is called and then I have two options:
> >>
> >> - using eval(parse(text='length(X)'),
> >>                     envir=environment(formula) )
> >>
> >> - using length(get('X'),
> >>             envir=environment(formula) )
> >>
> >> a bit of benchmarking showed that the first option is about 20 times
> >> slower, to that extent that if I repeat it 10,000 times I save more than
> >> half a second. So speed is not really an issue here.
> >>
> >> Personally I'd go for option 2 as that one is easier to read and does
> the
> >> job nicely, but with these functions I'm always a bit afraid that I'm
> >> overseeing important details or side effects here (possibly memory
> issues
> >> when working with larger data).
> >>
> >> Anybody an idea what the dangers are of these methods, and which one is
> the
> >> most robust method?
> >>
> >> Thank you
> >> Joris
> >>
> >> --
> >> Joris Meys
> >> Statistical consultant
> >>
> >> Ghent University
> >> Faculty of Bioscience Engineering
> >> Department of Mathematical Modelling, Statistics and Bio-Informatics
> >>
> >> tel : +32 9 264 59 87
> >> Joris.Meys at Ugent.be
> >> -------------------------------
> >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> >>
> >>         [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
> >
> >
>
>


-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]



More information about the R-devel mailing list