[R] Problem with strptime generating missing values where none appear to exist

Jonathan Williams Jonathan.Williams at dpag.ox.ac.uk
Tue Feb 23 19:26:01 CET 2010


Dear R Helpers,

I am having difficulty with strptime. I wish to find the differences between
two vectors of times. I have apparently no difficulty to convert the vectors
to the appropriate format using strptime. But, then difftime does not
calculate all the differences.

Here is the code and output:-

dob=strptime(as.character(datx$BDT),'%d-%b-%y'); dob$year=dob$year-100
sdate=strptime(as.character(datx$SDT),'%d-%b-%y')
head(dob); head(sdate)
[1] "1922-07-14" "1922-07-14" "1922-07-14" "1922-07-14" "1921-03-23"
"1921-03-23"
[1] "2001-08-27" "2001-08-27" "2001-08-27" "2001-08-27" "2001-08-20"
"2001-08-20"
> str(dob)
 POSIXlt[1:9], format: "1922-07-14" "1922-07-14" "1922-07-14" "1922-07-14"
"1921-03-23" "1921-03-23" "1921-03-23" "1927-08-27" "1927-08-27"
"1927-08-27" "1927-08-27" "1940-04-05" "1940-04-05" "1940-04-05"
"1940-04-05" ...
> str(sdate)
 POSIXlt[1:9], format: "2001-08-27" "2001-08-27" "2001-08-27" "2001-08-27"
"2001-08-20" "2001-08-20" "2001-08-20" "2001-11-26" "2001-11-26"
"2001-11-26" "2001-11-26" "2002-05-20" "2002-05-20" "2002-05-20"
"2002-05-20" ...

table(is.na(sdate))

FALSE 
  812 

table(is.na(dob))

FALSE  TRUE 
  743    69 
But if I now look at each component of dob separately, none is missing

for (i in 1:length(dob)) {print(names(dob)[i]);
print(table(is.na(dob[[i]])))}

[1] "sec"

FALSE 
  812 
[1] "min"

FALSE 
  812 
[1] "hour"

FALSE 
  812 
[1] "mday"

FALSE 
  812 
[1] "mon"

FALSE 
  812 
[1] "year"

FALSE 
  812 
[1] "wday"

FALSE 
  812 
[1] "yday"

FALSE 
  812 
[1] "isdst"

FALSE 
  812 

Additionally, there are no NA values in any component of dob on direct
visual inspection. For example, here is dob$mon
dob$mon
[1]  6  6  6  6  2  2  2  7  7  7  7  3  3  3  3 11 11 11 11  7  7  7  7  7
7  7  7  4  4  4  4  4  4  4  4 11 11 11 11  7  7  7  7 11 11 11 11  6  6  6
6  9  9  9  9  3  3  3  3  6  6  6  6  8  8  8  8  7  7  7  7  4  4  4  4  4
 [77]  4  4  4  7  7  7  7 10 10 11 11 11 11  8  8  8  8  0  0  0  0  0 10
10 10  7  7  7  7  3  3  3  3  2  2  2  2  2  6  6  6  6  6  5  5  5  5  4
4  4  4  4 11 11 11 11  4  4  4  4  4  3  3  3  3  3  7  7  7  7  7  7  7  7
8  8
[153]  8  8  8  8  8  8  7  7  7  7  6  6 10 10 10 10  4  4  4  4  4  4  4
4 10 10 10 10 11 11 11 11  5  5  5  5  5  5  5  5  3  3  3  3  5  5  0  0  0
0  2  2  2  2  6  6  6  6  0  0  0  0  3  3  3  3  6  6  6  6  8  8  8  8  8
7
[229]  7  7  7  7  7  7  7  7  8  8  8  8  4  4  4  4 10 10 10 10  2  2  2
2  0  0  0  0  0  0  1  1  1  1  4  4  4  4  2  2  2  2  2  8  8  8  8 11 11
11 11  8  8  8  8  4  4  4  4  5  5  5  5  8  8  8  8  0  0  0  0  1  1  1
1  1
[305]  1  1  1  4  4  4  4  5  5  5  5  7  7  7  7  5  5  5  5  3  3  3  3
1  1  1  1  0  0  0  0  3  3  3  3  6  6  6  6  3  3  3  5  5 11 11 11  5  5
5  5  0  0  0  0 10 10 10 10  4  4  4  4  6  6  6  6  7  7  7  7  4  4  4  4
1
[381]  1  1  1  7  7  7  7  3  3  3  3  7  7  7  7  5  5  5  5  9  9  9  9
11 11 11 11 10 10 10 10  0  0  0  0  5  5  5  5  3  3  3  3  7  7  7  7  0
0  0  0  6  6  6  6  8  8  8  8  8  8  8  8  3  3  3  3  5  5  5  5 10 10 10
10  3
[457]  3  3  3  8  8  8  8  0  0  0  0 11 11 11 11  2  2  2  2  7  7  7  7
0  0  0  0  0  1  1  1  1  5  5  5  5  7  7  7  7  7  7  7  7  5  5  5  5  9
9  9  9  5  5  5  5  6  6  6  6  8  8  8  8 11 11 11 11  3  3  3  3  6  6  6
6
[533]  3  3  3  3  6  6  6  6  8  8  8  8  9  9  9  9  2  2  2  2  1  1  1
1  2  2  2  2  4  4  4  7  7  7  8  8  8  8  3  3  3  3  1  1  1  1  1  9  9
9  9  8  8  8  8 11 11 11 11  6  6  6  6  3  3  3  3 10 10 10  8  8  8  0  0
0
[609]  0  3  3  3  3  3  0  0  0  0  3  3  3  3  5  5  5  5 10 10 10 10 10
10 10 10  2  2  2  2  2  3  3  3  3  4  4  4 10 10 10 10  2  2  2  2  3  3
3  3  2  2  2  2  2  2  6  6  6  6  4  4  4  4 11 11 11 11  0  0  0  0 11 11
11 11
[685]  5  5  5  5  8  8  8  8  8  8  8  8  7  7  7  7  3  3  3  3  5  5  5
5 11 11 11 11  3  3  3  9  9  5  5  5  5  8  8  8  8  2  2  2  2  5  5  5  5
2  2  2  2 10 10 10 10  4  4  4 11 11 11 11  8  8  8  9  9  9  1  1  1  1  8
8
[761]  8  8  2  2  2 11 11 11 11  2  2  2  2  2  2  2  2  6  6  6  6 11 11
11 11  2  2  2 11 11 11  9  9  9  9  2  2  2  2  7  7  7  7 11 11 11  2  2
2  3  3  3

All the dob components are equally complete, including isdst.

However, when I then try to compute difftime(sdate,dob), 69 values are
missing:-
Time differences in days
  [1] 28899.00 28899.00 28899.00 28899.00 29369.96 29369.96 29369.96
27120.04 27120.04 27120.04 27120.04 22690.00 22690.00 22690.00 22690.00
28905.00 28905.00 28905.00 28905.00 31207.04 31207.04 31207.04 31207.04
31209.04 31209.04
 [26] 31209.04 31209.04 26323.00 26323.00 26323.00 26323.00 26338.00
26338.00 26338.00 26338.00 27310.96 27310.96 27310.96 27310.96 23588.04
23588.04 23588.04 23588.04 25255.00 25255.00 25255.00 25255.00 23752.00
23752.00 23752.00
 [51] 23752.00 29607.04 29607.04 29607.04 29607.04 27993.04 27993.04
27993.04 27993.04 28384.04 28384.04 28384.04 28384.04 26176.00 26176.00
26176.00 26176.00 28986.04 28986.04 28986.04 28986.04 28689.04 28689.04
28689.04 28689.04
 [76] 23722.00 23722.00 23722.00 23722.00 27353.00 27353.00 27353.00
27353.00 26303.00 26303.00 28803.96 28803.96 28803.96 28803.96 28564.04
28564.04 28564.04 28564.04 29826.96 29826.96 29826.96 29826.96 29826.96
30410.00 30410.00
[101] 30410.00 26490.04 26490.04 26490.04 26490.04       NA       NA
NA       NA 29765.96 29765.96 29765.96 29765.96 29765.96 26325.00 26325.00
26325.00 26325.00 26325.00 28824.00 28824.00 28824.00 28824.00 26808.00
26808.00
[126] 26808.00 26808.00 26808.00 28628.96 28628.96 28628.96 28628.96
23807.00 23807.00 23807.00 23807.00 23807.00       NA       NA       NA
NA       NA 25668.04 25668.04 25668.04 25668.04 28654.04 28654.04 28654.04
28654.04
[151] 21711.04 21711.04 21711.04 21711.04 27167.04 27167.04 27167.04
27167.04 24296.04 24296.04 24296.04 24296.04 30540.04 30540.04 25330.00
25330.00 25330.00 25330.00 25579.00 25579.00 25579.00 25579.00 29127.04
29127.04 29127.04
[176] 29127.04 29896.96 29896.96 29896.96 29896.96 25992.00 25992.00
25992.00 25992.00 26625.00 26625.00 26625.00 26625.00 30121.04 30121.04
30121.04 30121.04 21801.04 21801.04 21801.04 21801.04 31274.04 31274.04
25907.00 25907.00
[201] 25907.00 25907.00 28516.00 28516.00 28516.00 28516.00 28943.00
28943.00 28943.00 28943.00 29847.96 29847.96 29847.96 29847.96 30529.04
30529.04 30529.04 30529.04 30527.04 30527.04 30527.04 30527.04 29434.00
29434.00 29434.00
[226] 29434.00 29434.00 28631.04 28631.04 28631.04 28631.04 25761.04
25761.04 25761.04 25761.04 25761.04 26127.04 26127.04 26127.04 26127.04
26027.00 26027.00 26027.00 26027.00 28987.00 28987.00 28987.00 28987.00
29232.00 29232.00
[251] 29232.00 29232.00 26109.96 26109.96 26109.96 26109.96 31339.00
31339.00 29235.00 29235.00 29235.00 29235.00 28092.00 28092.00 28092.00
28092.00 30209.00 30209.00 30209.00 30209.00 30209.00 30281.00 30281.00
30281.00 30281.00
[276] 26880.96 26880.96 26880.96 26880.96 25691.04 25691.04 25691.04
25691.04 22938.04 22938.04 22938.04 22938.04 25878.00 25878.00 25878.00
25878.00 24470.00 24470.00 24470.00 24470.00 26046.96 26046.96 26046.96
26046.96 26763.96
[301] 26763.96 26763.96 26763.96 25720.96 25720.96 25720.96 25720.96
29214.00 29214.00 29214.00 29214.00 26992.00 26992.00 26992.00 26992.00
30659.00 30659.00 30659.00 30659.00 25600.00 25600.00 25600.00 25600.00
26842.00 26842.00
[326] 26842.00 26842.00 25541.00 25541.00 25541.00 25541.00 27386.00
27386.00 27386.00 27386.00 30302.04 30302.04 30302.04 30302.04 28059.00
28059.00 28059.00 28059.00       NA       NA       NA 25657.00 25657.00
NA       NA
[351]       NA 24835.00 24835.00 24835.00 24835.00 29340.96 29340.96
29340.96 29340.96 26473.96 26473.96 26473.96 26473.96 28873.00 28873.00
28873.00 28873.00 27690.00 27690.00 27690.00 27690.00 26554.00 26554.00
26554.00 26554.00
[376] 28876.00 28876.00 28876.00 28876.00 27156.96 27156.96 27156.96
27156.96 26577.00 26577.00 26577.00 26577.00 27471.00 27471.00 27471.00
27471.00 27323.00 27323.00 27323.00 27323.00 29232.00 29232.00 29232.00
29232.00       NA
[401]       NA       NA       NA 26523.96 26523.96 26523.96 26523.96
26538.96 26538.96 26538.96 26538.96 24374.96 24374.96 24374.96 24374.96
30798.00 30798.00 30798.00 30798.00       NA       NA       NA       NA
22775.04 22775.04
[426] 22775.04 22775.04 28464.00 28464.00 28464.00 28464.00 25763.04
25763.04 25763.04 25763.04 30114.04 30114.04 30114.04 30114.04 26864.04
26864.04 26864.04 26864.04       NA       NA       NA       NA 26945.04
26945.04 26945.04
[451] 26945.04 29528.96 29528.96 29528.96 29528.96 29058.04 29058.04
29058.04 29058.04 29456.00 29456.00 29456.00 29456.00 26450.96 26450.96
26450.96 26450.96 22837.96 22837.96 22837.96 22837.96 24222.96 24222.96
24222.96 24222.96
[476] 29592.00 29592.00 29592.00 29592.00 26573.00 26573.00 26573.00
26573.00 26573.00 24811.00 24811.00 24811.00 24811.00 24834.00 24834.00
24834.00 24834.00 31312.00 31312.00 31312.00 31312.00 23337.00 23337.00
23337.00 23337.00
[501] 26422.00 26422.00 26422.00 26422.00 22664.04 22664.04 22664.04
22664.04 23192.04 23192.04 23192.04 23192.04 27557.04 27557.04 27557.04
27557.04 23449.04 23449.04 23449.04 23449.04 27799.00 27799.00 27799.00
27799.00 28747.04
[526] 28747.04 28747.04 28747.04 24660.04 24660.04 24660.04 24660.04
NA       NA       NA       NA 24683.04 24683.04 24683.04 24683.04 26576.00
26576.00 26576.00 26576.00       NA       NA       NA       NA 28897.96
28897.96
[551] 28897.96 28897.96 25997.96 25997.96 25997.96 25997.96 24594.96
24594.96 24594.96 24594.96 25965.00 25965.00 25965.00 30139.04 30139.04
30139.04 26104.04 26104.04 26104.04 26104.04 26255.04 26255.04 26255.04
26255.04 28887.00
[576] 28887.00 28887.00 28887.00 28887.00       NA       NA       NA
NA 25470.00 25470.00 25470.00 25470.00 20677.96 20677.96 20677.96 20677.96
29227.00 29227.00 29227.00 29227.00       NA       NA       NA       NA
29543.96
[601] 29543.96 29543.96 31080.00 31080.00 31080.00 27710.00 27710.00
27710.00 27710.00       NA       NA       NA       NA       NA 29903.00
29903.00 29903.00 29903.00       NA       NA       NA       NA 24147.00
24147.00 24147.00
[626] 24147.00 23316.96 23316.96 23316.96 23316.96 27096.00 27096.00
27096.00 27096.00 25543.00 25543.00 25543.00 25543.00 25543.00       NA
NA       NA       NA 25131.04 25131.04 25131.04 29565.96 29565.96 29565.96
29565.96
[651] 28070.00 28070.00 28070.00 28070.00 28774.04 28774.04 28774.04
28774.04 28073.00 28073.00 28130.00 28130.00 28130.00 28130.00 20038.00
20038.00 20038.00 20038.00 27298.04 27298.04 27298.04 27298.04 27793.00
27793.00 27793.00
[676] 27793.00 25586.00 25586.00 25586.00 25586.00 26000.00 26000.00
26000.00 26000.00 30577.04 30577.04 30577.04 30577.04 27194.04 27194.04
27194.04 27194.04 23156.04 23156.04 23156.04 23156.04 23978.04 23978.04
23978.04 23978.04
[701]       NA       NA       NA       NA 24391.00 24391.00 24391.00
24391.00 27152.96 27152.96 27152.96 27152.96 28852.00 28852.00 28852.00
25419.00 25419.00 29212.00 29212.00 29212.00 29212.00 23660.00 23660.00
23660.00 23660.00
[726] 26022.96 26022.96 26022.96 26022.96 25566.00 25566.00 25566.00
25566.00 25336.96 25336.96 25336.96 25336.96 26931.96 26931.96 26931.96
26931.96 26758.00 26758.00 26758.00 26537.96 26537.96 26537.96 26537.96
27026.00 27026.00
[751] 27026.00       NA       NA       NA 24349.96 24349.96 24349.96
24349.96 25960.00 25960.00 25960.00 25960.00 27276.00 27276.00 27276.00
26826.96 26826.96 26826.96 26826.96 26428.96 26428.96 26428.96 26428.96
26780.96 26780.96
[776] 26780.96 26780.96 26301.00 26301.00 26301.00 26301.00 28385.96
28385.96 28385.96 28385.96 27210.96 27210.96 27210.96 23704.00 23704.00
23704.00 24160.04 24160.04 24160.04 24160.04 25703.96 25703.96 25703.96
25703.96 25269.00
[801] 25269.00 25269.00 25269.00 29886.96 29886.96 29886.96       NA
NA       NA       NA       NA       NA
attr(,"tzone")
[1] ""

Here are the values of sdate and dob that relate to the missing values in
difftime(sdate,dob)

> sdate[is.na(difftime(sdate,dob))]
 [1] "2002-02-28" "2002-02-28" "2002-02-28" "2002-02-28" "2002-07-30"
"2002-07-30" "2002-07-30" "2002-07-30" "2002-07-30" "2003-06-17"
"2003-06-17" "2003-06-17" "2003-10-30" "2003-10-30" "2003-10-30"
"2002-07-22" "2002-07-22"
[18] "2002-07-22" "2002-07-22" "2002-12-18" "2002-12-18" "2002-12-18"
"2002-12-18" "2003-03-10" "2003-03-10" "2003-03-10" "2003-03-10"
"2003-02-05" "2003-02-05" "2003-02-05" "2003-02-05" "2003-03-19"
"2003-03-19" "2003-03-19"
[35] "2003-03-19" "2003-05-29" "2003-05-29" "2003-05-29" "2003-05-29"
"2003-08-13" "2003-08-13" "2003-08-13" "2003-08-13" "2003-11-03"
"2003-11-03" "2003-11-03" "2003-11-03" "2003-11-03" "2002-06-25"
"2002-06-25" "2002-06-25"
[52] "2002-06-25" "2003-04-10" "2003-04-10" "2003-04-10" "2003-04-10"
"2003-04-03" "2003-04-03" "2003-04-03" "2003-04-03" "2003-10-15"
"2003-10-15" "2003-10-15" "2003-11-21" "2003-11-21" "2003-11-21"
"2003-12-04" "2003-12-04"
[69] "2003-12-04"
> dob[is.na(difftime(sdate,dob))]
 [1] "1927-04-03" "1927-04-03" "1927-04-03" "1927-04-03" "1925-04-11"
"1925-04-11" "1925-04-11" "1925-04-11" "1925-04-11" "1939-04-03"
"1939-04-03" "1939-04-03" "1940-12-30" "1940-12-30" "1940-12-30"
"1917-10-14" "1917-10-14"
[18] "1917-10-14" "1917-10-14" "1925-04-16" "1925-04-16" "1925-04-16"
"1925-04-16" "1927-04-05" "1927-04-05" "1927-04-05" "1927-04-05"
"1939-04-08" "1939-04-08" "1939-04-08" "1939-04-08" "1938-10-24"
"1938-10-24" "1938-10-24"
[35] "1938-10-24" "1930-10-16" "1930-10-16" "1930-10-16" "1930-10-16"
"1923-04-17" "1923-04-17" "1923-04-17" "1923-04-17" "1929-04-17"
"1929-04-17" "1929-04-17" "1929-04-17" "1929-04-17" "1925-04-11"
"1925-04-11" "1925-04-11"
[52] "1925-04-11" "1931-04-02" "1931-04-02" "1931-04-02" "1931-04-02"
"1929-04-18" "1929-04-18" "1929-04-18" "1929-04-18" "1917-10-22"
"1917-10-22" "1917-10-22" "1928-03-28" "1928-03-28" "1928-03-28"
"1928-04-09" "1928-04-09"
[69] "1928-04-09"

The values of dob here do not differ in any obvious way from those in the
rest of the dob vector, where difftime(sdate,sob) gives sensible results.

If I try to recompute the difftime, the result is the same.

s1=sdate[is.na(difftime(sdate,dob))]
d1=dob[is.na(difftime(sdate,dob))]
difftime(s1,d1)
Time differences in secs
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
attr(,"tzone")
[1] ""

However, if I now create the first value of each missing vector manually,
then difftime works:-

js1=strptime('2002-02-28','%Y-%m-%d'); js1
#[1] "2002-02-28"
jb1=strptime('1927-04-03','%Y-%m-%d'); jb1
#[1] "1927-04-03"
difftime(js1,jb1)
#Time difference of 27360 days

So, it appears that strptime is handling these values differently in the
vector, but manages them correctly one by one.

I'm sorry if I'm being silly, but I can't see the problem. I'd be VERY
grateful if someone could help me to find it and fix it.

With many thanks in advance for your thoughts,

Jonathan Williams



More information about the R-help mailing list