[Rd] [R] NaN, Inf to NA

Duncan Murdoch murdoch.duncan at gmail.com
Fri May 27 18:37:22 CEST 2011


On 27/05/2011 11:53 AM, Prof Brian Ripley wrote:
> On Fri, 27 May 2011, Duncan Murdoch wrote:
>
> >  On 27/05/2011 11:11 AM, Martin Maechler wrote:
> >>  >>>>>   Duncan Murdoch<murdoch.duncan at gmail.com>
> >>  >>>>>       on Fri, 27 May 2011 08:23:14 -0400 writes:
> >>
> >>       >   On 11-05-27 4:27 AM, Albert-Jan Roskam wrote:
> >>       >>   Aha! Thank you very much for that clarification! It would
> >>       >>   be much more user friendly if R generated a
> >>       >>   NotImplementedError or something similar. The 'garbage
> >>       >>   results' are pretty misleading, esp. to a novice.
> >>
> >>       >   I think that's a good idea.  The default methods are
> >>       >   documented to work on atomic vectors; dataframes are not
> >>       >   atomic vectors, so it would be reasonable to generate an
> >>       >   error.  (See ?is.atomic for a definition of atomic
> >>       >   vectors.)
> >>
> >>       >   I'll see if this causes a lot of trouble...
> >>
> >>       >   Duncan Murdoch
> >>
> >>  Duncan,
> >>  do you remember the issue of mean(), var(), median(),... etc
> >>  that was the topic a few weeks ago ?
> >>
> >>  I strongly advocated that  mean.data.frame() should become
> >>  *deprecated*, and I would propose the same for the functions
> >>  mentioned here.
> >
> >  I think you may have misunderstood my proposal.  Currently is.nan, is.finite
> >  and is.infinite have no data.frame methods, so the default method is used.
> >  The problem is that the default method is too permissive:  it operates on the
> >  data.frame by treating it as a list; then it returns FALSE for each list
> >  element.  (If there is only one row, it applies the test to the singleton in
> >  the column.)   This is pretty strange default behaviour.
> >
> >  What I'm proposing is that the default method should trigger an error if you
> >  try to send it anything that's not atomic.  This gives sensible behaviour in
> >  most cases; the only one where it doesn't work is a list of singletons, which
> >  used to be handled sensibly, but will now fail.
> >
> >  (There's still a question about what the answer should be for these functions
> >  when applied to character or raw vectors, which are both atomic.  I'm leaning
> >  towards returning FALSE for every element, which matches the current
> >  behaviour, but perhaps those should also generate an error.)
>
> I noticed you did not mention integer vectors.  Those are no
> different from character or raw: there are no NaN (nor infinite)
> integer elements.  I don't see it should be an error to ask in those
> cases.

Right, in those cases I think it's clear that is.finite should return 
TRUE for every element except NA_integer_, and is.infinite and is.nan 
should return FALSE.  I would treat logical the same way since we often 
promote logical to integer in a calculation.

Duncan Murdoch

> >
> >  I think this partially addresses Bill's objection, but not completely.
> >  Someone could still put a class on an atomic vector, and that might not be
> >  handled properly by the default method.
> >
> >>  People should  *apply (or *ply) on data frames, and not expect
> >>  that all kind of functions have data.frame methods
> >>  which are simply equivalent to basically  sapply(<df>,<function>)
> >>
> >>  {and yes -- all this belongs to R-devel rather than R-help}
> >
> >  Where I've moved it now.
> >
> >  Duncan Murdoch
> >>  Martin
> >>
> >>       >>   I wanted to recode every NaN and Inf value of an entire
> >>       >>   data.frame to NA. The data.frame also includes character
> >>       >>   variables. So the following might work (?)  (Can't test
> >>       >>   it here)
> >>       >>
> >>       >>   ditch<- function(x) ifelse(is.infinite(x) | is.nan(x),
> >>       >>   NA, x) df<- apply(df, 2, ditch)
> >>       >>
> >>       >>
> >>       >>
> >>       >>
> >>       >>
> >>       >>   ________________________________ From: William
> >>       >>   Dunlap<wdunlap at tibco.com>
> >>       >>
> >>       >>   Cc: R Mailing List<r-help at r-project.org>   Sent: Fri, May
> >>       >>   27, 2011 12:57:01 AM Subject: RE: [R] NaN, Inf to NA
> >>       >>
> >>       >>   I think the source of the OP's problem is that while
> >>       >>   things like df>30 and is.na(df) return a logical matrix
> >>       >>   with the dimensions of the data.frame df, both
> >>       >>   is.infinite(df) and is.nan(df) return a logical vector as
> >>       >>   long as the number of columns of df.  (`>` and is.na have
> >>       >>   data.frame methods but is.infinite and is.nan do not: the
> >>       >>   latter give garbage results for data.frames.)
> >>       >>
> >>       >>   Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
> >>       >>
> >>       >>>   -----Original Message----- From:
> >>       >>>   r-help-bounces at r-project.org
> >>       >>>   [mailto:r-help-bounces at r-project.org] On Behalf Of Marc
> >>       >>>   Schwartz Sent: Thursday, May 26, 2011 2:15 PM To:
> >>       >>>   Albert-Jan Roskam Cc: R Mailing List Subject: Re: [R]
> >>       >>>   NaN, Inf to NA
> >>       >>>
> >>       >>>   On May 26, 2011, at 3:18 PM, Albert-Jan Roskam wrote:
> >>       >>>
> >>       >>>>   Hi,
> >>       >>>>
> >>       >>>>   I want to recode all Inf and NaN values to NA, but I;m
> >>       >>>   surprised to see the
> >>       >>>>   result of the following code. Could anybody enlighten
> >>       >>>>   me
> >>       >>>   about this?
> >>       >>>>
> >>       >>>>>   df<- data.frame(a=c(NA, NaN, Inf, 1:3))
> >>       >>>>>   df[is.infinite(df) | is.nan(df)]<- NA df
> >>       >>>>   a 1 NA 2 NaN 3 Inf 4 1 5 2 6 3
> >>       >>>>>
> >>       >>>>
> >>  >>>
> >>  >>>   Thanks!
> >>       >>>>
> >>       >>>>   Cheers!!  Albert-Jan
> >>       >>>
> >>       >>>
> >>       >>>   The canonical way is to use is.na() to assign the NA
> >>       >>>   value based upon a condition. See ?is.na for more
> >>       >>>   information.
> >>       >>>
> >>       >>>   is.na(df$a)<- !is.finite(df$a)
> >>       >>>
> >>       >>>>   df
> >>       >>>   a 1 NA 2 NA 3 NA 4 1 5 2 6 3
> >>       >>>
> >>       >>>
> >>       >>>   HTH,
> >>       >>>
> >>       >>>   Marc Schwartz
> >>       >>>
> >>       >>>   ______________________________________________
> >>       >>>   R-help at r-project.org mailing list
> >>       >>>   https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
> >>       >>>   read the posting guide
> >>       >>>   http://www.R-project.org/posting-guide.html and provide
> >>       >>>   commented, minimal, self-contained, reproducible code.
> >>       >>>
> >>       >>
> >>  >   [[alternative HTML version deleted]]
> >>       >>
> >>       >>   ______________________________________________
> >>       >>   R-help at r-project.org mailing list
> >>       >>   https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
> >>       >>   read the posting guide
> >>       >>   http://www.R-project.org/posting-guide.html and provide
> >>       >>   commented, minimal, self-contained, reproducible code.
> >>
> >>       >   ______________________________________________
> >>       >   R-help at r-project.org mailing list
> >>       >   https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
> >>       >   read the posting guide
> >>       >   http://www.R-project.org/posting-guide.html and provide
> >>       >   commented, minimal, self-contained, reproducible code.
> >
> >  ______________________________________________
> >  R-devel at r-project.org mailing list
> >  https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>



More information about the R-devel mailing list