[Rd] [R] NaN, Inf to NA

Duncan Murdoch murdoch.duncan at gmail.com
Fri May 27 17:33:18 CEST 2011


On 27/05/2011 11:11 AM, Martin Maechler wrote:
> >>>>>  Duncan Murdoch<murdoch.duncan at gmail.com>
> >>>>>      on Fri, 27 May 2011 08:23:14 -0400 writes:
>
>      >  On 11-05-27 4:27 AM, Albert-Jan Roskam wrote:
>      >>  Aha! Thank you very much for that clarification! It would
>      >>  be much more user friendly if R generated a
>      >>  NotImplementedError or something similar. The 'garbage
>      >>  results' are pretty misleading, esp. to a novice.
>
>      >  I think that's a good idea.  The default methods are
>      >  documented to work on atomic vectors; dataframes are not
>      >  atomic vectors, so it would be reasonable to generate an
>      >  error.  (See ?is.atomic for a definition of atomic
>      >  vectors.)
>
>      >  I'll see if this causes a lot of trouble...
>
>      >  Duncan Murdoch
>
> Duncan,
> do you remember the issue of mean(), var(), median(),... etc
> that was the topic a few weeks ago ?
>
> I strongly advocated that  mean.data.frame() should become
> *deprecated*, and I would propose the same for the functions
> mentioned here.

I think you may have misunderstood my proposal.  Currently is.nan, 
is.finite and is.infinite have no data.frame methods, so the default 
method is used.  The problem is that the default method is too 
permissive:  it operates on the data.frame by treating it as a list; 
then it returns FALSE for each list element.  (If there is only one row, 
it applies the test to the singleton in the column.)   This is pretty 
strange default behaviour.

What I'm proposing is that the default method should trigger an error if 
you try to send it anything that's not atomic.  This gives sensible 
behaviour in most cases; the only one where it doesn't work is a list of 
singletons, which used to be handled sensibly, but will now fail.

(There's still a question about what the answer should be for these 
functions when applied to character or raw vectors, which are both 
atomic.  I'm leaning towards returning FALSE for every element, which 
matches the current behaviour, but perhaps those should also generate an 
error.)

I think this partially addresses Bill's objection, but not completely.  
Someone could still put a class on an atomic vector, and that might not 
be handled properly by the default method.

> People should  *apply (or *ply) on data frames, and not expect
> that all kind of functions have data.frame methods
> which are simply equivalent to basically  sapply(<df>,<function>)
>
> {and yes -- all this belongs to R-devel rather than R-help}

Where I've moved it now.

Duncan Murdoch
> Martin
>
>      >>  I wanted to recode every NaN and Inf value of an entire
>      >>  data.frame to NA. The data.frame also includes character
>      >>  variables. So the following might work (?)  (Can't test
>      >>  it here)
>      >>
>      >>  ditch<- function(x) ifelse(is.infinite(x) | is.nan(x),
>      >>  NA, x) df<- apply(df, 2, ditch)
>      >>
>      >>
>      >>
>      >>
>      >>
>      >>  ________________________________ From: William
>      >>  Dunlap<wdunlap at tibco.com>
>      >>
>      >>  Cc: R Mailing List<r-help at r-project.org>  Sent: Fri, May
>      >>  27, 2011 12:57:01 AM Subject: RE: [R] NaN, Inf to NA
>      >>
>      >>  I think the source of the OP's problem is that while
>      >>  things like df>30 and is.na(df) return a logical matrix
>      >>  with the dimensions of the data.frame df, both
>      >>  is.infinite(df) and is.nan(df) return a logical vector as
>      >>  long as the number of columns of df.  (`>` and is.na have
>      >>  data.frame methods but is.infinite and is.nan do not: the
>      >>  latter give garbage results for data.frames.)
>      >>
>      >>  Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
>      >>
>      >>>  -----Original Message----- From:
>      >>>  r-help-bounces at r-project.org
>      >>>  [mailto:r-help-bounces at r-project.org] On Behalf Of Marc
>      >>>  Schwartz Sent: Thursday, May 26, 2011 2:15 PM To:
>      >>>  Albert-Jan Roskam Cc: R Mailing List Subject: Re: [R]
>      >>>  NaN, Inf to NA
>      >>>
>      >>>  On May 26, 2011, at 3:18 PM, Albert-Jan Roskam wrote:
>      >>>
>      >>>>  Hi,
>      >>>>
>      >>>>  I want to recode all Inf and NaN values to NA, but I;m
>      >>>  surprised to see the
>      >>>>  result of the following code. Could anybody enlighten
>      >>>>  me
>      >>>  about this?
>      >>>>
>      >>>>>  df<- data.frame(a=c(NA, NaN, Inf, 1:3))
>      >>>>>  df[is.infinite(df) | is.nan(df)]<- NA df
>      >>>>  a 1 NA 2 NaN 3 Inf 4 1 5 2 6 3
>      >>>>>
>      >>>>
> >>>
> >>>  Thanks!
>      >>>>
>      >>>>  Cheers!!  Albert-Jan
>      >>>
>      >>>
>      >>>  The canonical way is to use is.na() to assign the NA
>      >>>  value based upon a condition. See ?is.na for more
>      >>>  information.
>      >>>
>      >>>  is.na(df$a)<- !is.finite(df$a)
>      >>>
>      >>>>  df
>      >>>  a 1 NA 2 NA 3 NA 4 1 5 2 6 3
>      >>>
>      >>>
>      >>>  HTH,
>      >>>
>      >>>  Marc Schwartz
>      >>>
>      >>>  ______________________________________________
>      >>>  R-help at r-project.org mailing list
>      >>>  https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>      >>>  read the posting guide
>      >>>  http://www.R-project.org/posting-guide.html and provide
>      >>>  commented, minimal, self-contained, reproducible code.
>      >>>
>      >>
> >  [[alternative HTML version deleted]]
>      >>
>      >>  ______________________________________________
>      >>  R-help at r-project.org mailing list
>      >>  https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>      >>  read the posting guide
>      >>  http://www.R-project.org/posting-guide.html and provide
>      >>  commented, minimal, self-contained, reproducible code.
>
>      >  ______________________________________________
>      >  R-help at r-project.org mailing list
>      >  https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
>      >  read the posting guide
>      >  http://www.R-project.org/posting-guide.html and provide
>      >  commented, minimal, self-contained, reproducible code.



More information about the R-devel mailing list