[R] no non-missing arguments to max; returning -Inf [2(dplyr/mutate()]

Muhuri, Pradip (SAMHSA/CBHSQ) Pradip.Muhuri at samhsa.hhs.gov
Sun Nov 30 16:10:33 CET 2014


Hello,

With dplyr mutate(), the code below creates a new column (oiddate), which is the maximum of the four dates (mrjdate,cocdate, inhdate, haldate).  The code seems to provide the results (presented below) I desired.  But, the issue is that I am getting  the following warning message:  1: In max(13113, NA_real_, 14336, NA_real_, na.rm = TRUE) :   no non-missing arguments to max; returning -Inf 2.

Is this warning message harmful?  Any hints how to tweak the code in order to correct the problem or avoid this message?

Please note that I did not get this warning message when I executed the code on the reproducible example data posted to this forum in the past  and that I am now getting this warning when applying the code on the actual working data file.   Thanks to Arun, Mark and others on this forum for their help with tweaking the code in the past.   Sorry for not providing the reproducible example this time.  

Thanks,

Pradip Muhuri

#################  R script followed by console (log and output) #############
setwd ("H:/R/cis_study")
library(dplyr)
load("xd2012.rdata")
# create a new column of the max date from four dates

 test <- xd2012 %>% 
  rowwise() %>%
  mutate( oidflag= ifelse(mrjflag==1 | cocflag==1 | inhflag==1 | halflag==1, 1, 0),
          oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')) %>%
  filter(oidflag==1)  %>%
  select( mrjdate, cocdate, inhdate, haldate,  oiddate)
  
head(test)
warnings(2)
    

##########################  below is from the console  ####################
load("xd2012.rdata")
> # create a new column of the max date from four dates
> 
>  test <- xd2012 %>% 
+   rowwise() %>%
+   mutate( oidflag= ifelse(mrjflag==1 | cocflag==1 | inhflag==1 | halflag==1, 1, 0),
+           oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, na.rm=TRUE), origin='1970-01-01')) %>%
+   filter(oidflag==1)  %>%
+   select( mrjdate, cocdate, inhdate, haldate,  oiddate)
There were 50 or more warnings (use warnings() to see the first 50)
>   
> head(test)
Source: local data frame [6 x 5]

     mrjdate cocdate    inhdate    haldate    oiddate
1 2003-02-22    <NA> 2006-03-10 2005-09-17 2006-03-10
2 2007-12-07    <NA>       <NA>       <NA> 2007-12-07
3 1994-05-15    <NA>       <NA>       <NA> 1994-05-15
4 2003-04-19    <NA>       <NA>       <NA> 2003-04-19
5 2009-11-13    <NA>       <NA>       <NA> 2009-11-13
6 1973-10-08    <NA>       <NA> 1974-01-04 1974-01-04
> warnings(2)
Warning messages:
1: In max(13113, NA_real_, 14336, NA_real_, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf 2
2: In max(13113, NA_real_, 14336, NA_real_, na.rm = TRUE) :
no non-missing arguments to max; returning -Inf 2


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ)
Sent: Monday, November 10, 2014 1:09 PM
To: 'Mark Sharp'
Cc: r-help at r-project.org
Subject: Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

Mark,

Thank you very much for further looking into this issue.  So, the "ugly" solution is better!  Would you like to bring to Hadley's attention that mutate does set the NA value for the new column?

Regards,

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-----Original Message-----
From: Mark Sharp [mailto:msharp at TxBiomed.org] 
Sent: Monday, November 10, 2014 12:23 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help at r-project.org
Subject: Re: [R] range () does not remove NA's with complete.cases() for dates (dplyr/mutate)

Pradip,

For some reason mutate is not setting the is.NA value for the new column. Note the output below using your data structures.

> ## It looks at first as if the second element of both columns are NA.
> data2$mrjdate[2]
[1] NA
> data2$oiddate[2]
[1] NA
> ## for convenience
> mrj <- data2$mrjdate[2]
> oid <- data2$oiddate[2]
> mode(mrj)
[1] "numeric"
> mode(oid)
[1] "numeric"
> str(mrj)
 Date[1:1], format: NA
> str(oid)
 Date[1:1], format: NA
> class(mrj)
[1] "Date"
> class(oid)
[1] "Date"
> ## But note:
> identical(mrj, oid)
[1] FALSE
> all.equal(mrj, oid)
[1] "'is.NA' value mismatch: 0 in current 1 in target"
## functioning code
data2$mrjdate[2]
data2$oiddate[2]
mrj <- data2$mrjdate[2]
oid <- data2$oiddate[2]
mode(mrj)
mode(oid)
str(mrj)
str(oid)
class(mrj)
class(oid)
# But note:
identical(mrj, oid)
all.equal(mrj, oid)

## This ugly solution does not have the problem.
> data3 <- data1
> data3$oiddate <- as.Date(sapply(seq_along(data3$id), function(row) {
+   if (all(is.na(unlist(data1[row, -1])))) {
+     max_d <- NA
+   } else {
+     max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
+   }
+   max_d}),
+   origin = "1970-01-01")
>
> range(data3$mrjdate[complete.cases(data3$mrjdate)])
[1] "2004-11-04" "2009-10-24"
> range(data3$cocdate[complete.cases(data3$cocdate)])
[1] "2005-08-10" "2011-10-05"
> range(data3$inhdate[complete.cases(data3$inhdate)])
[1] "2005-07-07" "2011-10-13"
> range(data3$haldate[complete.cases(data3$haldate)])
[1] "2007-11-07" "2011-11-04"
> range(data3$oiddate[complete.cases(data3$oiddate)])
[1] "2006-09-01" "2011-11-04"
>
Working code below.

data3 <- data1
data3$oiddate <- as.Date(sapply(seq_along(data3$id), function(row) {
  if (all(is.na(unlist(data1[row, -1])))) {
    max_d <- NA
  } else {
    max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = "1970-01-01")

range(data3$mrjdate[complete.cases(data3$mrjdate)])
range(data3$cocdate[complete.cases(data3$cocdate)])
range(data3$inhdate[complete.cases(data3$inhdate)])
range(data3$haldate[complete.cases(data3$haldate)])
range(data3$oiddate[complete.cases(data3$oiddate)])


On Nov 10, 2014, at 10:10 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hello,
>
> The range() with complete.cases() removes NA's for the date variables that are read from a data frame.  However, the issue is that the same function does not remove NA's for the other date variable that is created using the dplyr/mutate().  The console and the reproducible example are given below. Any advice how to resolve this issue would be appreciated.
>
> Thanks,
>
> Pradip Muhuri
>
>
> #################  cut and pasted from the R console ####################
>
> id    mrjdate    cocdate    inhdate    haldate    oiddate
> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
> 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
> 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
> 4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
> 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
> 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
> 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>>
>> # range of dates
>>
>> range(data2$mrjdate[complete.cases(data2$mrjdate)])
> [1] "2004-11-04" "2009-10-24"
>> range(data2$cocdate[complete.cases(data2$cocdate)])
> [1] "2005-08-10" "2011-10-05"
>> range(data2$inhdate[complete.cases(data2$inhdate)])
> [1] "2005-07-07" "2011-10-13"
>> range(data2$haldate[complete.cases(data2$haldate)])
> [1] "2007-11-07" "2011-11-04"
>> range(data2$oiddate[complete.cases(data2$oiddate)])
> [1] NA           "2011-11-04"
>
>
> ################  reproducible code #############################
>
> library(dplyr)
> library(lubridate)
> library(zoo)
> # data object - description of the
>
> temp <- "id  mrjdate cocdate inhdate haldate
> 1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
> 2             NA         NA         NA         NA
> 3     2009-10-24         NA 2011-10-13         NA
> 4     2007-10-10         NA         NA         NA
> 5     2006-09-01 2005-08-10         NA         NA
> 6     2007-09-04 2011-10-05         NA         NA
> 7     2005-10-25         NA         NA 2011-11-04"
>
> # read the data object
>
> data1 <- read.table(textConnection(temp),
>                    colClasses=c("character", "Date", "Date", "Date", "Date"),
>                    header=TRUE, as.is=TRUE
>                    )
>
>
> # create a new column
>
> data2 <- data1 %>%
>     rowwise() %>%
>      mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate,
>                                                               na.rm=TRUE), origin='1970-01-01'))
>
> # print records
>
> print (data2)
>
> # range of dates
>
> range(data2$mrjdate[complete.cases(data2$mrjdate)])
> range(data2$cocdate[complete.cases(data2$cocdate)])
> range(data2$inhdate[complete.cases(data2$inhdate)])
> range(data2$haldate[complete.cases(data2$haldate)])
> range(data2$oiddate[complete.cases(data2$oiddate)])
>
>
>
>
>
> Pradip K. Muhuri, PhD
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
>
>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


NOTICE:  This E-Mail (including attachments) is confidential and may be legally privileged.  It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited.  Please reply to the sender that you have received this message in error, then delete it.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list