[R] delete data row

William Dunlap wdunlap at tibco.com
Mon Oct 18 17:37:46 CEST 2010


I see that both which(condition) and subset(data,condition)
both treat NA's in condition that same as FALSE's.  This
leads people to use those functions for their NA-treating
properties instead of for their main functionality (which
may not be the best way to get things done).

I wonder how much code would break if x[condition],
for a logical condition, would break if it returned
only the elements of x corresponding to TRUE's in
the condition (instead of also returning NA's for NA's
in condition).  How much currently broken code would
start working?  I suspect omitting the NA's from the
output might be better.

I'm don't know about the case of NA's in integer
subscripts.  This would affect things in the logical
case because x[NA] would use the logical case and
x[c(1,NA)] the integer.

As for your VisTRUE, the following is much faster
but silently coerces non-logical arguments to logicals.
  VisTRUE2 <- function(x) !is.na(x) && x

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -----Original Message-----
> From: Joshua Wiley [mailto:jwiley.psych at gmail.com] 
> Sent: Sunday, October 17, 2010 6:27 PM
> To: David Winsemius
> Cc: William Dunlap; R-help at r-project.org
> Subject: Re: [R] delete data row
> 
> I used the -which() construct initially to try to show "deleting"
> cases.  I believe it hung around longer than it should have.  That
> said, I have also had David's experience with NAs.  What about a
> vectorized version of identical(TRUE, x)?  This avoids the which()
> problem Bill pointed out, and the NA issue David mentioned.  Does it
> introduce new problems?
> 
> x <- 1:10
> y <- log(x-5)
> VisTRUE <- Vectorize(isTRUE)
> x[VisTRUE(y > -Inf)]
> 
> Josh
> 
> 
> On Sun, Oct 17, 2010 at 4:38 PM, David Winsemius 
> <dwinsemius at comcast.net> wrote:
> >
> > On Oct 17, 2010, at 3:56 PM, William Dunlap wrote:
> >
> >>
> >>> I had been thinking of:
> >>>>
> >>>> x <- c(1, (2^(0.5))^2 , 3, 5, (2^(0.5))^2 , 3, 1)
> >>>> y <- 2
> >>>> x[-which(zapsmall(x-y) == 0)]
> >>>
> >>> [1] 1 3 5 3 1
> >>
> >> Using which() to convert logicals into integer
> >> subscripts is almost always unnecessary and often wrong.
> >
> > At one time I believed that too. However, in the situation 
> where the test
> > produces NA rather than a numeric value  when one is 
> indexing in the first
> > argument. I have had the unpleasant experience of pages if 
> useless and
> > frustrating to understand output because of this "feature".
> >
> > I learned to either use which() in the first argument to 
> "[" or to use
> > subset to avoid inadvertent "returns" from logical indexing.
> >
> >> x <- 1:10
> >> y <- log(x-5)
> > Warning message:
> > In log(x - 5) : NaNs produced
> >> x[y>-Inf]
> > [1] NA NA NA NA  6  7  8  9 10
> >
> >> x[which(y>-Inf)]
> > [1]  6  7  8  9 10
> >
> > If that test were used in a dataframe indexing, the entire 
> line might come
> > back as a "result".
> >
> >
> >
> >> In this case it fails when no x is close to y,
> >> because integer(0) is the same thing as -integer(0):
> >>
> >>> x[-which(zapsmall(x-10) == 0)]
> >>
> >>  numeric(0)
> >>
> >> The whichless version, using logical subscripts,
> >> works (in this case we want all of x):
> >>
> >>> x[zapsmall(x-10)!=0]
> >>
> >>  [1] 1 2 3 5 2 3 1
> >
> > Maybe the rule should be don't use the -which construction:
> >
> >> x <- c(1, (2^(0.5))^2 , 3, 5, (2^(0.5))^2 , 3, 1)
> >> y <- 2
> >> x[which(zapsmall(x-10) != 0)]
> > [1] 1 2 3 5 2 3 1
> >
> > --
> > David.
> >>
> >> When using logicals as subscripts, read the "["
> >> as "such that".
> >>
> >> Bill Dunlap
> >> Spotfire, TIBCO Software
> >> wdunlap tibco.com
> >
> >
> 
> -- 
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
> 



More information about the R-help mailing list