[R] x %>% y as an alternative to which( x > y)

William Dunlap wdunlap at tibco.com
Tue Sep 13 23:54:46 CEST 2011


I often use the following function
  is.true <- function(x) !is.na(x) & x
and, less often,
  is.false <- function(x) !is.na(x) & !x
to report if elements of a logical vector are TRUE (not
FALSE or NA) or FALSE (not TRUE or NA), respectively.

Do your complicated logical expression and apply is.true()
to the result before passing it to "[" and you will get
what subset gives you without subset's nonstandard
argument evaluation.  (The latter is handy for interactive
use and often painful in general purpose functions.)

E.g., change your
  (mydataframeName$myvariableName > 2 & !is.na(mydataframeName$myvariableName)) &
   (mydataframeName$myotherVariableName == "male" & !is.na(mydataframeName$myotherVariableName))
to
  is.true(mydataframName$myvariableName > 2 & mydataframName$myotherVariableName == "male")

Don't confuse this is.true() with the entirly different base::isTRUE(), which
reports whether its argument is identical to TRUE (length 1, no names or other
attributes).

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Timothy Bates
> Sent: Tuesday, September 13, 2011 2:18 PM
> To: Hadley Wickham; Duncan Murdoch
> Cc: R list
> Subject: Re: [R] x %>% y as an alternative to which( x > y)
> 
> Dear Duncan and Hadley,
> 
> I stumbled across the NA behavior of subset a little while ago and thought it might do the trick. But
> my common usage case is not getting a subsetting sans NAs, but setting values in the whole dataframe.
> 
> So I need T/F at each row, not just the list of rows that match the subset of matching cases...
> 
> How would you do this with subset?
> 
>    data[data$YOB < 1908 & !is.na(data$YOB), "Age"]=NA
> 
> My %<% idea extends the vocabulary established by %in%, and works in the same grammatical situation.
> 
> here's a real example
> 
> # Fix missing T2 sex for same sex pairs...
> 
> twinData[twinData$Age %<% 12, "flynnEffect"] = FALSE # only set flynn F for people under 12, not inc
> NAs
> 
> Addressing Duncan's point about returning a logical array... the %<% function should be:
> 
> "%<%" <- function(table, x){
> 	lessThan = table < x
> 	lessThan[is.na(lessThan)] = FALSE
> 	return(lessThan)
> }
> 
> This also works for matrices as it should
> 
> > x = matrix(c(1:10,NA,12:20),nrow=2)
> > x %<% 6
>      [,1] [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10]
> [1,] TRUE TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> [2,] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> 
> 
> On Sep 13, 2011, at 8:40 PM, Hadley Wickham wrote:
> 
> >> Because in coding, I often end up with big chunks looking like this:
> >>
> >> ((mydataframeName$myvariableName > 2 & !is.na(mydataframeName$myvariableName)) &
> (mydataframeName$myotherVariableName == "male" & !is.na(mydataframeName$myotherVariableName)))
> >>
> >> Which is much less readable/maintainable/editable than
> >>
> >> mydataframeName$myvariableName > 2 & mydataframeName$myotherVariableName == "male"
> >
> > Use subset:
> >
> > subset(mydataframeName, myvariableName > 2 & myotherVariableName == "male")
> >
> > (subset automatically treats NAs as false)
> >
> > Hadley
> >
> > --
> > Assistant Professor / Dobelman Family Junior Chair
> > Department of Statistics / Rice University
> > http://had.co.nz/
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list