[R] as.Date() results depend on order of data within vector?

Prof Brian Ripley ripley at stats.ox.ac.uk
Sun Jan 7 13:36:48 CET 2007


On Sun, 7 Jan 2007, Mark Wardle wrote:

> Dear all,
>
> The as.Date() function appears to give different results depending on
> the order of the vector passed into it.
>
> d1 = c("1900-01-01", "2007-01-01","","2001-05-03")
> d2 = c("", "1900-01-01", "2007-01-01","2001-05-03")
> as.Date(d1)	# gives correct results
> as.Date(d2)	# fails with error (* see below)
>
> This problem does not arise if the dates are NA rather than an empty
> string, but my data is coming via RODBC and I still don't have NAs
> passed across properly.
>
> I might add that I initially noticed this behaviour when using RODBC's
> sqlQuery() function call, and I initially had difficulty explaining why
> one column of dates was passed correctly, but another failed. The
> failing column was a "date of death" column where it was NA ("") for
> most patients.
>
> I've come up with two workarounds that work. The first is to sort the
> data at the SQL level, ensuring the initial record is not null. The
> second is to use sqlQuery() with as.is=T option, and then do the sorting
> and conversion afterwards.
>
> Is the behaviour of as.Date() shown above as expected/designed?

Yes.  It uses the first non-NA string to choose the format *if you do not 
specify it*.

The correct work-around is to get non-valid strings returned as NA, not 
"".  That is argument 'na.strings' in RODBC (and elsewhere: read.table 
behaves in the same way).

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list