[R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

Muhuri, Pradip (SAMHSA/CBHSQ) Pradip.Muhuri at samhsa.hhs.gov
Sun Nov 9 05:40:33 CET 2014


Hello,



The example data frame in the reproducible code below has 5 columns (1 column for id and 4 columns for dates), and there are 7 observations.  I would like to insert the most recent date from those 4 date columns into a new column (oiddate) using the mutate() function in the dplyr package.   I am getting correct results (NA in the new column) if a given row has all NA's in the four columns.  However, the issue is that the date value inserted into the new column (oidflag) is incorrect for 5 of the remaining 6 rows (with a non-NA value in at least 1 of the four columns).



I would appreciate receiving your help toward resolving the issue.  Please see the R console and the R script (reproducible example)below.



Thanks in advance.



Pradip





######  from the console ########

print (data2)

  id    mrjdate    cocdate    inhdate    haldate    oidflag

1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2011-11-04

2  2       <NA>       <NA>       <NA>       <NA>       <NA>

3  3 2009-10-24       <NA> 2011-10-13       <NA> 2011-11-04

4  4 2007-10-10       <NA>       <NA>       <NA> 2011-11-04

5  5 2006-09-01 2005-08-10       <NA>       <NA> 2011-11-04

6  6 2007-09-04 2011-10-05       <NA>       <NA> 2011-11-04

7  7 2005-10-25       <NA>       <NA> 2011-11-04 2011-11-04





##################  Reproducible code and data #####################################



library(dplyr)

library(lubridate)

library(zoo)

# data object - description of the



temp <- "id  mrjdate cocdate inhdate haldate

1     2004-11-04 2008-07-18 2005-07-07 2007-11-07

2             NA         NA         NA         NA

3     2009-10-24         NA 2011-10-13         NA

4     2007-10-10         NA         NA         NA

5     2006-09-01 2005-08-10         NA         NA

6     2007-09-04 2011-10-05         NA         NA

7     2005-10-25         NA         NA 2011-11-04"



# read the data object



data1 <- read.table(textConnection(temp),

                    colClasses=c("character", "Date", "Date", "Date", "Date"),

                    header=TRUE, as.is=TRUE

                    )

# create a new column



data2 <- mutate(data1,

                oidflag= ifelse(is.na(mrjdate) & is.na(cocdate) & is.na(inhdate)  & is.na(haldate), NA,

                                  max(mrjdate, cocdate, inhdate, haldate,na.rm=TRUE )

                                )

                )



# convert to date

data2$oidflag = as.Date(data2$oidflag, origin="1970-01-01")



# print records



print (data2)





Pradip K. Muhuri, PhD

SAMHSA/CBHSQ

1 Choke Cherry Road, Room 2-1071

Rockville, MD 20857

Tel: 240-276-1070

Fax: 240-276-1260





	[[alternative HTML version deleted]]



More information about the R-help mailing list