[R] Checking for invalid dates: Code works but needs improvement

Paul Miller pjmiller_57 at yahoo.com
Wed Feb 1 15:06:22 CET 2012


Hi Marc,

That the code I wrote initially is over engineered is certainly possible. Of course, Rui's solution is a reworking of that code. If starting from scratch, Rui likely would have done something quite different. I focused on Rui's code because it was complete and was a clear improvement over what I had intially.

Your code is great but doesn't do everything I think I need.

This leads to the question of exactly what I do need. I thought I had done well to create sample data and working (albeit inelegant) code. I realize now though that this really wasn't sufficient. It would have been better to supply a list of things I would like the code to do.

So let me try again. Ideally I'd like to have a function called something like "readDates," a call to which might look like: 

readDates(indata=TestDates, outdata=TestDates2, 
          dates=c("birthDT", "diagnosisDT", "metastaticDT"), 
          datefmt="%m/%d/%Y", monthimp=15, 
          mindt="1900-01-01", maxdt=Sys.Date())

Besides allowing the user to specify an input data name, an output data name, the dates to be read, and an incoming date format, the readDates function would:

1. Impute by default the 15th of month if it  is 'un', 'unk', 'Un', 'Unk', 'UN', etc, but allow the user to select another value such as the 1st. 

2. Reject by default dates before 1900-01-01 or after the current date, but allow the user to specify other values.

3. Ignore dates with month or year values of 'un', 'unk', 'Un', 'Unk', 'UN', etc. That is, set them to missing but not report them as part of a warning message.

4. Reject dates with components (month, day, or year) that are not of the correct length. In most cases, I think this would involve lengths of 2,2, and 4. For some date formats though (e.g., 05Jan2012), this might not be the case.

5. Print warning messages for invalid dates something like: 

Warning: Invalid date values in birthDT 

11/23/21931 
06/20/1840 
06/31/1933 

Warning: Invalid date values in diagnosisDT 

02/30/2010 
05/16/2015 
 
6. Convert to a date any input columns that do not have invalid dates. This would include columns with unknown month and year values, like my "metastaticDT." 

7. Allow things like the date format and minimum and maximum date values to vary by input column.

Admittedly, this is a lot. And I wouldn't blame you if you didn't want to touch it with a ten-foot pole. 

It's what's on my wish list though.

Thanks,

Paul



More information about the R-help mailing list