[R] strange date problem - May 3, 1992 is NA

Brian Diggs diggsb at ohsu.edu
Fri Jun 24 21:38:07 CEST 2011


On 6/22/2011 2:28 PM, Alexander Shenkin wrote:
> On 6/22/2011 4:09 PM, Brian Diggs wrote:
>> On 6/22/2011 1:37 PM, Alexander Shenkin wrote:
>>> On 6/22/2011 3:34 PM, Brian Diggs wrote:
>>>> On 6/22/2011 12:09 PM, Luke Miller wrote:
>>>>> For what it's worth, I cannot reproduce this problem under a nearly
>>>>> identical instance of R (R 2.12.1, Win 7 Pro 64-bit). I also can't
>>>>> reproduce the problem with R 2.13.0. You've got something truly weird
>>>>> going on with your particular instance of R.
>>>>>
>>>>>
>>>>>> is.na(strptime("5/3/1992", format="%m/%d/%Y"))
>>>>> [1] FALSE
>>>>>> is.na(strptime("5/2/1992", format="%m/%d/%Y"))
>>>>> [1] FALSE
>>>>>> sessionInfo()
>>>>> R version 2.12.1 (2010-12-16)
>>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>>
>>>>> locale:
>>>>> [1] LC_COLLATE=English_United States.1252
>>>>> [2] LC_CTYPE=English_United States.1252
>>>>> [3] LC_MONETARY=English_United States.1252
>>>>> [4] LC_NUMERIC=C
>>>>> [5] LC_TIME=English_United States.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>
>>>>> other attached packages:
>>>>> [1] rj_0.5.2-1      lattice_0.19-17
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] grid_2.12.1  rJava_0.8-8  tools_2.12.1
>>>>
>>>> Like Luke, I can not reproduce what you see in (an old installation of)
>>>> R 2.12.1 (and it also didn't have rj, lattice, grid, rJava, or tools
>>>> attached or loaded in any way).
>>>>
>>>> My vague gut feeling is it might be a timezone/daylight savings time
>>>> related issue (though usually times have to be involved).  At least,
>>>> that is a common problem with weird things happening with dates.
>>>>
>>>> What do you get as output for the following?
>>>>
>>>> Sys.timezone()
>>>> Sys.info()
>>>> conflicts()
>>>> dput(strptime("5/3/1992", format="%m/%d/%Y"))
>>>> dput(as.POSIXct(strptime("5/3/1992", format="%m/%d/%Y")))
>>>> dput(strptime("5/2/1992", format="%m/%d/%Y"))
>>>> dput(as.POSIXct(strptime("5/2/1992", format="%m/%d/%Y")))
>>>
>>>> Sys.timezone()
>>> [1] "COT"
>>>> Sys.info()
>>>                        sysname                      release
>>>          version                     nodename                      machine
>>>                      "Windows"                      "7 x64" "build 7601,
>>> Service Pack 1"             "machine_name"                        "x86"
>>>                          login                         user
>>>                     "username"                   "username"
>>>> conflicts()
>>> [1] "untangle.specials" "body<-"            "format.pval"
>>> "round.POSIXt"      "trunc.POSIXt"      "units"
>>>> dput(strptime("5/3/1992", format="%m/%d/%Y"))
>>> structure(list(sec = 0, min = 0L, hour = 0L, mday = 3L, mon = 4L,
>>>       year = 92L, wday = 0L, yday = 123L, isdst = -1L), .Names = c("sec",
>>> "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
>>> ), class = c("POSIXlt", "POSIXt"))
>>>> dput(as.POSIXct(strptime("5/3/1992", format="%m/%d/%Y")))
>>> structure(NA_real_, class = c("POSIXct", "POSIXt"), tzone = "")
>>>> dput(strptime("5/2/1992", format="%m/%d/%Y"))
>>> structure(list(sec = 0, min = 0L, hour = 0L, mday = 2L, mon = 4L,
>>>       year = 92L, wday = 6L, yday = 122L, isdst = 0L), .Names = c("sec",
>>> "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
>>> ), class = c("POSIXlt", "POSIXt"))
>>>> dput(as.POSIXct(strptime("5/2/1992", format="%m/%d/%Y")))
>>> structure(704782800, class = c("POSIXct", "POSIXt"), tzone = "")
>>>
>>
>> Fun :)
>>
>> So, not being familiar with COT, I looked it up to see what/when the
>> daylight savings times switch overs are/were.
>>
>> http://www.timeanddate.com/worldclock/timezone.html?n=41&syear=1990
>>
>> Daylight savings time started (in 1992 only) on "Midnight between
>> Saturday, May 2 and Sunday, May 3" and ended (in 1993) on "Midnight
>> between Saturday, April 3 and Sunday, April 4". In particular, it went
>> from Saturday, May 2, 1992 11:59:59 PM to Sunday, May 3 1992 1:00:00 AM.
>>   So there was no midnight on May 3.  So when strptime converts the date,
>> it, by default, sets the time to midnight.  Except that is not valid
>> according to the DST rules (which is why isdst gets set to -1). When
>> converting to a POSIXct, it becomes NA.
>>
>> There are probably a lot of places in R that assume midnight is a valid
>> time, and so I don't know what all will or will not work in that
>> timezone (you probably will also have problems with seq and cut on
>> POSIXct/POSIXlt's in that timezone at least).  I'd recommend using a
>> different timezone.  Or, if you don't need times, using Date (which
>> doesn't have timezones and so avoids this):
>>
>> as.Date("5/3/1992", format="%m/%d/%Y")
>
> Thanks for your detective work, Brian!  Nice one.  I am now using
> "date", and so _my_ problem is solved.  However, it must be the case
> that others have and will continue to run across this problem (and
> perhaps won't even realize it, thus tossing away data).  Indeed, it
> seems like there are quite a number of places that have DST switching at
> midnight:
> http://www.google.com/search?q=Midnight+site%3Ahttp%3A%2F%2Fwww.timeanddate.com%2Fworldclock%2Ftimezone.html
> .  I assume all these timezones would come across a similar problem as mine?
>
> What would be the best route to try to get this smoothed over in R-core?

No one else has chimed in, so I filed a bug report

https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14615

>>
>>>>
>>>>
>>>>> On Wed, Jun 22, 2011 at 2:40 PM, Alexander Shenkin<ashenkin at ufl.edu>
>>>>> wrote:
>>>>>> On 6/22/2011 1:34 PM, Sarah Goslee wrote:
>>>>>>> On Wed, Jun 22, 2011 at 2:28 PM, David
>>>>>>> Winsemius<dwinsemius at comcast.net>    wrote:
>>>>>>>>
>>>>>>>> On Jun 22, 2011, at 2:03 PM, Sarah Goslee wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On Wed, Jun 22, 2011 at 11:40 AM, Alexander
>>>>>>>>> Shenkin<ashenkin at ufl.edu>
>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> is.na(strptime("5/2/1992", format="%m/%d/%Y"))
>>>>>>>>>>
>>>>>>>>>> [1] FALSE
>>>>>>>>>>>
>>>>>>>>>>> is.na(strptime("5/3/1992", format="%m/%d/%Y"))
>>>>>>>>>>
>>>>>>>>>> [1] TRUE
>>>>>>>>>
>>>>>>>>> I can't reproduce your problem on R 2.13.0 on linux:
>>>>>>>>
>>>>>>>> I also cannot reproduce it on a Mac with 2.13.0 beta
>>>>>>>
>>>>>>> Which strongly suggests that you should start by upgrading your R
>>>>>>> installation if at all possible.
>>>>>>>
>>>>>>> I'd also recommend trying it on a default R session, with no extra
>>>>>>> packages loaded, and no items in your workspace. It's possible that
>>>>>>> something else is interfering.
>>>>>>>
>>>>>>> On linux, that's achieved by typing R --vanilla at the command line.
>>>>>>> I'm afraid I don't know how to do it for Windows, but should be
>>>>>>> similarly straightforward.
>>>>>>>
>>>>>> Thanks Sarah.  Still getting the problem.  I should surely upgrade,
>>>>>> but
>>>>>> still, not a bad idea to get to the bottom of this, or at least
>>>>>> have it
>>>>>> documented as a known issue.  BTW, I'm on Windows 7 Pro x64.
>>>>>>
>>>>>> (running Rgui.exe --vanilla):
>>>>>>
>>>>>>> is.na(strptime("5/3/1992", format="%m/%d/%Y"))
>>>>>> [1] TRUE
>>>>>>
>>>>>>> is.na(strptime("5/2/1992", format="%m/%d/%Y"))
>>>>>> [1] FALSE
>>>>>>
>>>>>>> sessionInfo()
>>>>>> R version 2.12.1 (2010-12-16)
>>>>>> Platform: i386-pc-mingw32/i386 (32-bit)
>>>>>>
>>>>>> locale:
>>>>>> [1] LC_COLLATE=English_United States.1252
>>>>>> [2] LC_CTYPE=English_United States.1252
>>>>>> [3] LC_MONETARY=English_United States.1252
>>>>>> [4] LC_NUMERIC=C
>>>>>> [5] LC_TIME=English_United States.1252
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>


-- 
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University



More information about the R-help mailing list