[Rd] as.Date nuance

Vladimir Dergachev vdergachev at rcgardis.com
Mon Mar 26 23:08:18 CEST 2007


On Saturday 24 March 2007 12:12 pm, Gabor Grothendieck wrote:
> It matches in the sense of grep or regexpr
>
> grep("a", "ab") > 0
> regexpr("a", "ab") > 0
>
> Try this:
>
> x <- c("2006-01-01error", "2006-01-01")
> as.Date(x, "%Y-%m-%d") + ifelse(regexpr("^....-..-..$", x) > 0, 0, NA)
>

Well, still I would have expected as.Date() to do the same thing as.integer() 
or as.numeric() do - return NA and produce a warning.

After poking in the code I also noticed that the format guess is done using 
the first element only:

> as.Date(c("2006", "2006-01-01"))
Error in fromchar(x) : character string is not in a standard unambiguous 
format

> as.Date(c("2006-01-01", "2006"))
[1] "2006-01-01" NA

I attached a patch that changes do_strptime to behave like coerceToInteger, 
please let me know if it is reasonable - I'll then see about getting 
as.Date() to work correctly..

                                thank you

                                       Vladimir Dergachev

Index: src/main/datetime.c
===================================================================
--- src/main/datetime.c	(revision 40895)
+++ src/main/datetime.c	(working copy)
@@ -818,9 +818,9 @@
 SEXP attribute_hidden do_strptime(SEXP call, SEXP op, SEXP args, SEXP env)
 {
     SEXP x, sformat, ans, ansnames, klass, stz, tzone;
-    int i, n, m, N, invalid, isgmt = 0, settz = 0;
+    int i, n, m, N, invalid, isgmt = 0, settz = 0, warn = 0;
     struct tm tm, tm2;
-    char *tz = NULL, oldtz[20] = "";
+    char *tz = NULL, oldtz[20] = "", *p;
     double psecs = 0.0;
 
     checkArity(op, args);
@@ -859,10 +859,15 @@
 	tm.tm_year = tm.tm_mon = tm.tm_mday = tm.tm_yday = 
 	    tm.tm_wday = NA_INTEGER;
 	tm.tm_isdst = -1;
-	invalid = STRING_ELT(x, i%n) == NA_STRING ||
-	    !R_strptime(CHAR(STRING_ELT(x, i%n)),
-			CHAR(STRING_ELT(sformat, i%m)), &tm, &psecs);
+	invalid = STRING_ELT(x, i%n) == NA_STRING;
 	if(!invalid) {
+	    invalid = !(p=R_strptime(CHAR(STRING_ELT(x, i%n)),
+			CHAR(STRING_ELT(sformat, i%m)), &tm, &psecs)) ||
+	    		(*p);
+	    warn |= invalid;
+	    }
+
+	if(!invalid) {
 	    /* Solaris sets missing fields to 0 */
 	    if(tm.tm_mday == 0) tm.tm_mday = NA_INTEGER;
 	    if(tm.tm_mon == NA_INTEGER || tm.tm_mday == NA_INTEGER
@@ -901,6 +906,8 @@
     }
     if(settz) reset_tz(oldtz);
 
+    if(warn) warning(_("NAs introduced by coercion"));
+
     UNPROTECT(3);
     return ans;
 }



More information about the R-devel mailing list