[R] eval(parse(text vs. get when accessing a function

Sat Jan 6 20:23:27 CET 2007

Hi Martin,

On 1/6/07, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> Hi Ramon,
>
> It seems like a naming convention (f.xxx) and eval(parse(...)) are
> standing in for objects (of class 'GeneSelector', say, representing a
> function with a particular form and doing a particular operation) and
> dispatch (a function 'geneConverter' might handle a converter of class
> 'GeneSelector' one way, user supplied ad-hoc functions more carefully;
> inside geneConverter the only real concern is that the converter
> argument is in fact a callable function).
>
> eval(parse(...)) brings scoping rules to the fore as an explicit
> programming concern; here scope is implicit, but that's probably better
> -- R will get its own rules right.
>
> Martin
>
> Here's an S4 sketch:
>
> setClass("GeneSelector",
>          contains="function",
>          representation=representation(description="character"),
>          validity=function(object) {
>              msg <- NULL
>              argNames <- names(formals(object))
>              if (argNames[1]!="x")
>                msg <- c(msg, "\n  GeneSelector requires a first argument named 'x'")
>              if (!"..." %in% argNames)
>                msg <- c(msg, "\n  GeneSelector requires '...' in its signature")
>              if (0==length(object at description))
>                msg <- c(msg, "\n  Please describe your GeneSelector")
>              if (is.null(msg)) TRUE else msg
>          })
>
> setGeneric("geneConverter",
>            function(converter, x, ...) standardGeneric("geneConverter"),
>            signature=c("converter"))
>
> setMethod("geneConverter",
>           signature(converter="GeneSelector"),
>           function(converter, x, ...) {
>               ## important stuff here
>               converter(x, ...)
>           })
>
> setMethod("geneConverter",
>           signature(converter="function"),
>           function(converter, x, ...) {
>               message("ad-hoc converter; hope it works!")
>               converter(x, ...)
>           })
>
> and then...
>
> > c1 <- new("GeneSelector",
> +           function(x, ...) prod(x, ...),
> +           description="Product of x")
> >
> > c2 <- new("GeneSelector",
> +           function(x, ...) sum(x, ...),
> +           description="Sum of x")
> >
> > geneConverter(c1, 1:4)
> [1] 24
> > geneConverter(c2, 1:4)
> [1] 10
> > geneConverter(mean, 1:4)
> ad-hoc converter; hope it works!
> [1] 2.5
> >
> > cvterr <- new("GeneSelector", function(y) {})
> Error in validObject(.Object) : invalid class "GeneSelector" object: 1:
>   GeneSelector requires a first argument named 'x'
> invalid class "GeneSelector" object: 2:
>   GeneSelector requires '...' in its signature
> invalid class "GeneSelector" object: 3:
>   Please describe your GeneSelector
> > xxx <- 10
> > geneConverter(xxx, 1:4)
> Error in function (classes, fdef, mtable)  :
>         unable to find an inherited method for function "geneConverter", for signature "numeric"
>

Thanks!! That is actually a rather interesting alternative approach
and I can see it also adds a lot of structure to the problem. I have
to confess, though, that I am not a fan of OOP (nor of S4 classes); in
this case, in particular, it seems there is a lot of scaffolding in
the code above (the counterpoint to the structure?) and, regarding
scoping rules, I prefer to think about them explicitly (I find it much
simpler than inheritance).

Best,

R.

>
> "Ramon Diaz-Uriarte" <rdiaz02 at gmail.com> writes:
>
> > Dear Greg,
> >
> >
> > On 1/5/07, Greg Snow <Greg.Snow at intermountainmail.org> wrote:
> >> Ramon,
> >>
> >> I prefer to use the list method for this type of thing, here are a couple of reasons why (maybe you are more organized than me and would never do some of the stupid things that I have, so these don't apply to you, but you can see that the general suggestion applys to some of the rest of us).
> >>
> >
> >
> > Those suggestions do apply to me of course (no claim to being
> > organized nor beyond idiocy here). And actually the suggestions on
> > this thread are being very useful. I think, though, that I was not
> > very clear on the context and my examples were too dumbed down. So
> > I'll try to give more detail (nothing here is secret, I am just trying
> > not to bore people).
> >
> > The code is part of a web-based application, so there is no
> > interactive user. The R code is passed the arguments (and optional
> > user functions) from the CGI.
> >
> > There is one "core" function (call it cvFunct) that, among other
> > things, does cross-validation. So this is one way to do things:
> >
> > cvFunct <- function(whatever, genefiltertype, whateverelse) {
> >       internalGeneSelect <- eval(parse(text = paste("geneSelect",
> >                                              genefiltertype, sep = ".")))
> >
> >       ## do things calling internalGeneSelect,
> > }
> >
> > and now define all possible functions as
> >
> > geneSelect.Fratio <- function(x, y, z) {##something}
> > geneSelect.Wilcoxon <- function(x, y, z) {## something else}
> >
> > If I want more geneSelect functions, adding them is simple. And I can
> > even allow the user to pass her/his own functions, with the only
> > restriction that it takes three args, x, y, z, and that the function
> > is to be called: "geneSelect." and a user choosen string. (Yes, I need
> > to make sure no calls to "system", etc, are in the user code, etc,
> > etc, but that is another issue).
> >
> > The general idea is not new of course. For instance, in package
> > "e1071", a somewhat similar thing is done in function "tune", and
> > David Meyer there uses "do.call". However, tune is a lot more general
> > than what I had in mind. For instance, "tune" deals with arbitrary
> > functions, with arbitrary numbers and names of parameters, whereas my
> > functions above all take only three arguments (x: a matrix, y: a
> > vector; z: an integer), so the neat functionality provided by
> > "do.call", and passing the args as a list is not really needed.
> >
> > So, given that my situation is so structured, and I do not need
> > "do.call", I think the approach via eval(parse(paste makes my life
> > simple:
> >
> > a) the central function (cvFunct) uses something I can easily
> > recognize: "internalGeneSelect"
> >
> > b) after the initial eval(parse(text I do not need to worry anymore
> > about what the "true" gene selection function is called
> >
> > c) adding new functions and calling them is simple: function naming
> > follows a simple pattern ("geneSelect." + postfix) and calling the
> > user function only requires passing the postfix to cvFunct.
> >
> > d) notice also that, at least the functs. I define, will of course not
> > be named "f.1", etc, but rather things like "geneSelect.Fratio" or
> > "geneSelect.namesThatStartWithCuteLetters";
> >
> > I hope this makes things more clear. I did not include this detail
> > because this is probably boring (I guess most of you have stopped
> > reading by now :-).
> >
> >
> >> Using the list forces you to think about what functions may be called and thinking about things before doing them is usually a good idea.  Personally I don't trust the user of my functions (usually my future self who has forgotten something that seemed obvious at the time) to not do something stupid with them.
> >>
> >> With list elements you can have names for the functions and access them either by the name or by a number, I find that a lot easier when I go back to edit/update than to remember which function f.1 or f.2 did what.
> >>
> >
> > But I don't see how having your functions as list elements is easier
> > (specially if the function is longer than 2 to 3 lines) than having
> > all functions systematically named things such as:
> >
> > geneSelect.Fratio
> > geneSelect.Random
> > geneSelect.LetterA
> > etc
> >
> > Of course, I could have a list with the components named "Fratio"
> > "Random", "LetterA". But I fail to see what it adds. And it forces me
> > to build the list, and probably rebuild it whe (or not build it until)
> > the user enters her/his own selection function. But the later I do not
> > need to do with the scheme above.
> >
> >
> >> With your function, what if the user runs:
> >>
> >> > g(5,3)
> >>
> >> What should it do?  (you have only shown definitions for f.1 and f.2).  With my luck I would accidentily type that and just happen to have a f.3 function sitting around from a previous project that does something that I really don't want it to do now.  If I use the list approach then I will get a subscript out of bounds error rather than running something unintended.
> >>
> >>
> >
> > I see the general concern, but not how it applies here. If I pass
> > argument "Fratio" then either I use geneSelect.Fratio or I get an
> > error if "geneSelect.Fratio" does not exist. Similar to what would
> > happen if I do
> >
> > g1(2, 8)
> >
> > when f.8 is not defined:
> >
> > Error in eval(expr, envir, enclos) : object "f.8" not found
> > So even in more general cases, except for function redefinitions, etc,
> > you are not able to call non-existent stuff.
> >
> >> 2nd, If I used the eval-parse approach then I would probably at some point redefine f.1 or f.2 to the output of a regression analysis or something, then go back and run the g function at a later time and wonder why I am getting an error, then once I have finally figured it out, now I need to remember what f.1 did and rewrite it again.  I am much less likely to accidentally replace an element of a list, and if the list is well named I am unlikely to replace the whole list by accident.
> >>
> >>
> >
> > Yes, that is true. Again, it does not apply to the actual case I have
> > in mind, but of course, without the detailed info on context I just
> > gave, you could not know that.
> >
> >
> >> 3rd, If I ever want to use this code somewhere else (new version of R, on the laptop, give to coworker, ...), it is a lot easier to save and load a single list than to try to think of all the functions that need to be saved.
> >>
> >
> > Oh, sure. But all the functions above live in a single file (actually,
> > a minipackage) except for the optional use function (which is read
> > from a file).
> >
> >
> >>
> >> Personally I have never regretted trying not to underestimate my own future stupidity.
> >>
> >
> > Neither do I. And actually, that is why I asked: if Thomas Lumley
> > said, in the fortune, that I better rethink about it, then I should
> > try rethinking about it. But I asked because I failed to see what the
> > problem is.
> >
> >
> >> Hope this helps,
> >>
> >
> > It certainly does.
> >
> >
> > Best,
> >
> > R.
> >
> >
> >> --
> >> Gregory (Greg) L. Snow Ph.D.
> >> Statistical Data Center
> >> Intermountain Healthcare
> >> greg.snow at intermountainmail.org
> >> (801) 408-8111
> >>
> >>
> >>
> >> > -----Original Message-----
> >> > From: r-help-bounces at stat.math.ethz.ch
> >> > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Ramon
> >> > Diaz-Uriarte
> >> > Sent: Friday, January 05, 2007 11:41 AM
> >> > To: Peter Dalgaard
> >> > Cc: r-help; rdiaz02 at gmail.com
> >> > Subject: Re: [R] eval(parse(text vs. get when accessing a function
> >> >
> >> > On Friday 05 January 2007 19:21, Peter Dalgaard wrote:
> >> > > Ramon Diaz-Uriarte wrote:
> >> > > > Dear All,
> >> > > >
> >> > > > I've read Thomas Lumley's fortune "If the answer is parse() you
> >> > > > should usually rethink the question.". But I am not sure it that
> >> > > > also applies (and why) to other situations (Lumley's comment
> >> > > > http://tolstoy.newcastle.edu.au/R/help/05/02/12204.html
> >> > > > was in reply to accessing a list).
> >> > > >
> >> > > > Suppose I have similarly called functions, except for a
> >> > postfix. E.g.
> >> > > >
> >> > > > f.1 <- function(x) {x + 1}
> >> > > > f.2 <- function(x) {x + 2}
> >> > > >
> >> > > > And sometimes I want to call f.1 and some other times f.2 inside
> >> > > > another function. I can either do:
> >> > > >
> >> > > > g <- function(x, fpost) {
> >> > > >     calledf <- eval(parse(text = paste("f.", fpost, sep = "")))
> >> > > >     calledf(x)
> >> > > >     ## do more stuff
> >> > > > }
> >> > > >
> >> > > >
> >> > > > Or:
> >> > > >
> >> > > > h <- function(x, fpost) {
> >> > > >     calledf <- get(paste("f.", fpost, sep = ""))
> >> > > >     calledf(x)
> >> > > >     ## do more stuff
> >> > > > }
> >> > > >
> >> > > >
> >> > > > Two questions:
> >> > > > 1) Why is the second better?
> >> > > >
> >> > > > 2) By changing g or h I could use "do.call" instead; why
> >> > would that
> >> > > > be better? Because I can handle differences in argument lists?
> >> >
> >> > Dear Peter,
> >> >
> >> > Thanks for your answer.
> >> >
> >> > >
> >> > > Who says that they are better?  If the question is how to call a
> >> > > function specified by half of its name, the answer could well be to
> >> > > use parse(), the point is that you should rethink whether that was
> >> > > really the right question.
> >> > >
> >> > > Why not instead, e.g.
> >> > >
> >> > > f <- list("1"=function(x) {x + 1} , "2"=function(x) {x + 2}) h <-
> >> > > function(x, fpost) f[[fpost]](x)
> >> > >
> >> > > > h(2,"2")
> >> > >
> >> > > [1] 4
> >> > >
> >> > > > h(2,"1")
> >> > >
> >> > > [1] 3
> >> > >
> >> >
> >> > I see, this is direct way of dealing with the problem.
> >> > However, you first need to build the f list, and you might
> >> > not know about that ahead of time. For instance, if I build a
> >> > function so that the only thing that you need to do to use my
> >> > function g is to call your function "f.something", and then
> >> > pass the "something".
> >> >
> >> > I am still under the impression that, given your answer,
> >> > using "eval(parse(text" is not your preferred way.  What are
> >> > the possible problems (if there are any, that is). I guess I
> >> > am puzzled by "rethink whether that was really the right question".
> >> >
> >> >
> >> > Thanks,
> >> >
> >> > R.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > > > Thanks,
> >> > > >
> >> > > >
> >> > > > R.
> >> >
> >> > --
> >> > Ram�n D�az-Uriarte
> >> > Centro Nacional de Investigaciones Oncol�gicas (CNIO)
> >> > (Spanish National Cancer Center) Melchor Fern�ndez Almagro, 3
> >> > 28029 Madrid (Spain)
> >> > Fax: +-34-91-224-6972
> >> > Phone: +-34-91-224-6900
> >> >
> >> > http://ligarto.org/rdiaz
> >> > PGP KeyID: 0xE89B3462
> >> > (http://ligarto.org/rdiaz/0xE89B3462.asc)
> >> >
> >> >
> >> >
> >> > **NOTA DE CONFIDENCIALIDAD** Este correo electr�nico, y en
> >> > s...{{dropped}}
> >> >
> >> > ______________________________________________
> >> > R-help at stat.math.ethz.ch mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> > http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >> >
> >>
> >>
> >
> >
> > --
> > Ramon Diaz-Uriarte
> > Statistical Computing Team
> > Structural Biology and Biocomputing Programme
> > Spanish National Cancer Centre (CNIO)
> > http://ligarto.org/rdiaz
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Martin T. Morgan
> Bioconductor / Computational Biology
> http://bioconductor.org
>

-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz