[R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Daniel Nordlund djnordlund at frontier.com
Sun Nov 9 20:26:59 CET 2014


On 11/9/2014 3:05 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
> Hi Dan,
>
> Thank you so much for sending me your code that provides me desired results. But, I don't understand  why I am getting the follow warning message, In FUN(newX[, i], ...) : no non-missing arguments, returning NA. Any thoughts?
>
> Regards,
>
> Pradip
>
>
>
> data2x <- within(data1, oidflag <- apply(data1[,-1], 1, max, na.rm=TRUE))
>
> Warning message:
> In FUN(newX[, i], ...) : no non-missing arguments, returning NA
>> data2x
>    id    mrjdate    cocdate    inhdate    haldate    oidflag
> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
> 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
> 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-10-13
> 4  4 2007-10-10       <NA>       <NA>       <NA> 2007-10-10
> 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2006-09-01
> 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-10-05
> 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>
>
> Pradip K. Muhuri, PhD
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel Nordlund
> Sent: Sunday, November 09, 2014 5:33 AM
> To: r-help at r-project.org
> Subject: Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)
>
> On 11/8/2014 8:40 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
>> Hello,
>>
>>
>>
>> The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations.  I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package.   I am getting correct results (NA in the new column) if a given row has all NA's in the four columns.  However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns).
>>
>>
>>
>> I would appreciate receiving your help toward resolving the issue.  Please see the R console and the R script (reproducible example)below.
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>> Pradip
>>
>>
>>
>>
>>
>> ######  from the console ########
>>
>> print (data2)
>>
>>     id    mrjdate    cocdate    inhdate    haldate    oidflag
>>
>> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04
>>
>> 2  2       <NA>       <NA>       <NA>       <NA>       <NA>
>>
>> 3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-11-04
>>
>> 4  4 2007-10-10       <NA>       <NA>       <NA> 2011-11-04
>>
>> 5  5 2006-09-01 2005-08-10       <NA>       <NA> 2011-11-04
>>
>> 6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-11-04
>>
>> 7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04
>>
>>
>>
>>
>>
>> ##################  Reproducible code and data
>> #####################################
>>
>>
>>
>> library(dplyr)
>>
>> library(lubridate)
>>
>> library(zoo)
>>
>> # data object - description of the
>>
>>
>>
>> temp <- "id  mrjdate cocdate inhdate haldate
>>
>> 1     2004-11-04 2008-07-18 2005-07-07 2007-11-07
>>
>> 2             NA         NA         NA         NA
>>
>> 3     2009-10-24         NA 2011-10-13         NA
>>
>> 4     2007-10-10         NA         NA         NA
>>
>> 5     2006-09-01 2005-08-10         NA         NA
>>
>> 6     2007-09-04 2011-10-05         NA         NA
>>
>> 7     2005-10-25         NA         NA 2011-11-04"
>>
>>
>>
>> # read the data object
>>
>>
>>
>> data1 <- read.table(textConnection(temp),
>>
>>                       colClasses=c("character", "Date", "Date", "Date",
>> "Date"),
>>
>>                       header=TRUE, as.is=TRUE
>>
>>                       )
>>
>> # create a new column
>>
>>
>>
>> data2 <- mutate(data1,
>>
>>                   oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) &
>> is.na(inhdate)  & is.na(haldate), NA,
>>
>>                                     max(mrjdate, cocdate, inhdate,
>> haldate,na.rm=TRUE )
>>
>>                                   )
>>
>>                   )
>>
>>
>>
>> # convert to date
>>
>> data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01")
>>
>>
>>
>> # print records
>>
>>
>>
>> print (data2)
>>
>>
>>
>>
>>
>> Pradip K. Muhuri, PhD
>>
>> SAMHSA/CBHSQ
>>
>> 1 Choke Cherry Road, Room 2-1071
>>
>> Rockville, MD 20857
>>
>> Tel: 240-276-1070
>>
>> Fax: 240-276-1260
>>
>>
>>
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> I am not familiar with the mutate() function from dplyr, but you can get your wanted results as follows:
>
> data2 <- within(data1, oidflag <- apply(data1[,-1], 1, max, na.rm=TRUE))
>
>
> Hope this is helpful,
>
> Dan
>
> Daniel Nordlund
> Bothell, WA USA
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

It means what it says.  In this case, for id=2 there are no non-missing 
values.  Since, na.rm was set to true, it is just warning you that since 
there was nothing left to get the max of, it is returning NA.

Dan

-- 
Daniel Nordlund
Bothell, WA USA



More information about the R-help mailing list