[R] Adding elements in an array where I have missing data.

Berton Gunter gunter.berton at gene.com
Tue May 2 19:26:05 CEST 2006


> 
> 
> --- Berton Gunter <gunter.berton at gene.com> wrote:
> 
> > > 
> > > Here are a few alternatives:
> > > 
> > > replace(a, is.na(a), 0) + b
> > > 
> > > ifelse(is.na(a), 0, a) + b
> > > 
> > > mapply(sum, a, b, MoreArgs = list(na.rm = TRUE))
> > > 
> > 
> > Well, Gabor, if you want to get fancy...
> > 
> > evalq({a[is.na(a)]<-0;a})+b
> 
> It's going into my tips file but what does it mean??
> Thanks
> 
> 
Note 1: The following is probably more arcane than most R users would care
to be bothered with. You have been forewarned.

Note 2: I am far from an expert on this so I would appreciate public
correction of any errors in the following.

Well, "what it means" is "explained" in the man page for evalq, but to
understand it you have to understand expression evaluation in R (or, really,
in any computer language). Basically, my understanding is as follows:when R
sees a series of characters like

a + b

it goes through roughly the following steps to figure out what to do (the
situation is actually more complicated because of method dispatch, but I'll
ignore this):

1) R creates a parse tree -- equivalently, a list -- with root "+" and 2
leaves, a and b. 

2) R now by default needs to evaluate the symbols "a" and "b" (as names, not
character strings). It uses it's lexical scoping procedures to do this. That
is, it uses lexical scoping to decide where to look up the name value pairs
whose names are a and b. See the R FAQ 3.3.1, ?environment, or the R
Language Definition Manual for more on this (also V & R's S PROGRAMMING has
a nice discussion of this).  

3) It now substitutes the values for a and b into the parse tree (or issues
an error message if none can be found, etc.). This is what is meant by "the
arguments are evaluated before being passed to the evaluator." 

4) This parse tree is now passed to the evaluator which adds the values (or,
in general, calls the appropriate method, I think -- I'm fuzzy on exactly
how method dispatch occurs here) and returns the result to R.

So how does this apply to the above? Well, the stuff in the curly braces is
an "expression" that ordinarily would be parsed and evaluated and its value
substituted into the left node of the "+" parse tree (the overall expression
[] + b ). As part of this evaluation, 0 would be substituted for the
missings in a and the changed a would be saved in the Global environment
(or, more generally, whatever the enclosing environment is).  However, evalq
protects it's argument from that evaluation, so that the whole expresseion
is passed **as an expression** -- e.g. an unevaluated parse tree -- to the
left node of the "+" parse tree. The right node symbol, b, would be
evaluated, since it's not so protected. This entire parse tree with "+" at
its root node is then passed to the evaluator for evaluation. It is
evaluated there locally -- that is, the changes in x are made only on a
local copy of x in the evaluator, not on the x in the global environment --
and the resulting value returned **without having changed x.**

HTH.

Cheers,
Bert




More information about the R-help mailing list