[R] How to globally convert NaN to NA in dataframe?

peter dalgaard pd@|gd @end|ng |rom gm@||@com
Fri Sep 3 11:51:55 CEST 2021


Yes, even

> summary(NA_real_)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
     NA      NA      NA     NaN      NA      NA       1 

which is presumably because the mean is an empty sum (= 0) divided by a zero count, and 0/0 = NaN.

Notice also the differenc between

> mean(NA_real_)
[1] NA
> mean(NA_real_, na.rm=TRUE)
[1] NaN


> On 3 Sep 2021, at 09:59 , Luigi Marongiu <marongiu.luigi using gmail.com> wrote:
> 
> Fair enough, I'll check the actual data to see if there are indeed any
> NaN (which should not, since the data are categories, not generated by
> math).
> Thanks!
> 
> On Fri, Sep 3, 2021 at 8:26 AM PIKAL Petr <petr.pikal using precheza.cz> wrote:
>> 
>> Hi Luigi.
>> 
>> Weird. But maybe it is the desired behaviour of summary when calculating
>> mean of numeric column full of NAs.
>> 
>> See example
>> 
>> dat <- data.frame(x=rep(NA, 110), y=rep(1, 110), z= rnorm(110))
>> 
>> # change all values in second column to NA
>> dat[,2] <- NA
>> # change some of them to NAN
>> dat[5:6, 2:3] <- 0/0
>> 
>> # see summary
>> summary(dat)
>>    x                 y             z
>> Mode:logical   Min.   : NA   Min.   :-1.9798
>> NA's:110       1st Qu.: NA   1st Qu.:-0.4729
>>                Median : NA   Median : 0.1745
>>                Mean   :NaN   Mean   : 0.1856
>>                3rd Qu.: NA   3rd Qu.: 0.8017
>>                Max.   : NA   Max.   : 2.5075
>>                NA's   :110   NA's   :2
>> 
>> # change NAN values to NA
>> dat[sapply(dat, is.nan)] <- NA
>> *************************
>> 
>> #summary is same
>> summary(dat)
>>    x                 y             z
>> Mode:logical   Min.   : NA   Min.   :-1.9798
>> NA's:110       1st Qu.: NA   1st Qu.:-0.4729
>>                Median : NA   Median : 0.1745
>>                Mean   :NaN   Mean   : 0.1856
>>                3rd Qu.: NA   3rd Qu.: 0.8017
>>                Max.   : NA   Max.   : 2.5075
>>                NA's   :110   NA's   :2
>> 
>> # but no NAN value in data
>> dat[1:10,]
>>    x  y          z
>> 1  NA NA -0.9148696
>> 2  NA NA  0.7110570
>> 3  NA NA -0.1901676
>> 4  NA NA  0.5900650
>> 5  NA NA         NA
>> 6  NA NA         NA
>> 7  NA NA  0.7987658
>> 8  NA NA -0.5225229
>> 9  NA NA  0.7673103
>> 10 NA NA -0.5263897
>> 
>> So my "nice compact command"
>> dat[sapply(dat, is.nan)] <- NA
>> 
>> works as expected, but summary gives as mean NAN.
>> 
>> Cheers
>> Petr
>> 
>>> -----Original Message-----
>>> From: R-help <r-help-bounces using r-project.org> On Behalf Of Luigi Marongiu
>>> Sent: Thursday, September 2, 2021 3:46 PM
>>> To: Andrew Simmons <akwsimmo using gmail.com>
>>> Cc: r-help <r-help using r-project.org>
>>> Subject: Re: [R] How to globally convert NaN to NA in dataframe?
>>> 
>>> `data[sapply(data, is.nan)] <- NA` is a nice compact command, but I still
>> get
>>> NaN when using the summary function, for instance one of the columns give:
>>> ```
>>> Min.   : NA
>>> 1st Qu.: NA
>>> Median : NA
>>> Mean   :NaN
>>> 3rd Qu.: NA
>>> Max.   : NA
>>> NA's   :110
>>> ```
>>> I tried to implement the second solution but:
>>> ```
>>> df <- lapply(x, function(xx) {
>>>  xx[is.nan(xx)] <- NA
>>> })
>>>> str(df)
>>> List of 1
>>> $ sd_ef_rash_loc___palm: logi NA
>>> ```
>>> What am I getting wrong?
>>> Thanks
>>> 
>>> On Thu, Sep 2, 2021 at 3:30 PM Andrew Simmons <akwsimmo using gmail.com>
>>> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> 
>>>> I would use something like:
>>>> 
>>>> 
>>>> x <- c(1:5, NaN) |> sample(100, replace = TRUE) |> matrix(10, 10) |>
>>>> as.data.frame() x[] <- lapply(x, function(xx) {
>>>>    xx[is.nan(xx)] <- NA_real_
>>>>    xx
>>>> })
>>>> 
>>>> 
>>>> This prevents attributes from being changed in 'x', but accomplishes the
>>> same thing as you have above, I hope this helps!
>>>> 
>>>> On Thu, Sep 2, 2021 at 9:19 AM Luigi Marongiu <marongiu.luigi using gmail.com>
>>> wrote:
>>>>> 
>>>>> Hello,
>>>>> I have some NaN values in some elements of a dataframe that I would
>>>>> like to convert to NA.
>>>>> The command `df1$col[is.nan(df1$col)]<-NA` allows to work column-wise.
>>>>> Is there an alternative for the global modification at once of all
>>>>> instances?
>>>>> I have seen from
>>>>> https://stackoverflow.com/questions/18142117/how-to-replace-nan-
>>> value
>>>>> -with-zero-in-a-huge-data-frame/18143097#18143097
>>>>> that once could use:
>>>>> ```
>>>>> 
>>>>> is.nan.data.frame <- function(x)
>>>>> do.call(cbind, lapply(x, is.nan))
>>>>> 
>>>>> data123[is.nan(data123)] <- 0
>>>>> ```
>>>>> replacing o with NA, but I got
>>>>> ```
>>>>> str(df)
>>>>>> logi NA
>>>>> ```
>>>>> when modifying my dataframe df.
>>>>> What would be the correct syntax?
>>>>> Thank you
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best regards,
>>>>> Luigi
>>>>> 
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> Luigi
>>> 
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> Best regards,
> Luigi
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com



More information about the R-help mailing list