[Rd] Regression in strptime

Lukas Stadler lukas.stadler at oracle.com
Tue Mar 15 15:36:03 CET 2016


Hi!

Some context for the tests Mick mentioned:
Our tests, which are part of the open-source FastR repository, consist of small executable R snippets.
While working on FastR, we test regularly by comparing the output generated when running them on FastR and GNUR.
Every difference hints at a problem in our implementation of the R language.

FastR, along with the tests, is based on a specific version or R, and every once in a while, we update this R version - we recently went from 3.1.3 to 3.2.4.
This is a complex process: we implement new builtins, modify and update tests, etc.
During that process we need to investigate any new differences that appear, and that’s how the strptime issue came to our attention.

These tests could theoretically also be used to detect changes in behavior between R versions.
It’s not that easy, though, since the output will differ depending on operating systems, compilers, and configuration details.
For FastR, we actually had to choose an platform with which we want to be consistent, because Java, with a tighter spec than C, does not have the same variations.

The set of tests can be seen in this file (which is autogenerated from our junit tests):
https://raw.githubusercontent.com/graalvm/fastr/master/com.oracle.truffle.r.test/src/com/oracle/truffle/r/test/ExpectedTestOutput.test
It’s ~20k individual tests, partly written by hand, partly generated algorithmically, and partly generated by the testR project.

I’ve attached a small script that I just hacked together that runs these tests - it could easily be adapted to compare the output of two different R installations.
Do you think this could be turned into a tool useful for R core development?

- Lukas

-------------- next part --------------

> On 15 Mar 2016, at 11:52, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
> 
>>>>>> peter dalgaard <pdalgd at gmail.com>
>>>>>>    on Sat, 12 Mar 2016 19:11:40 +0100 writes:
> 
>> OK, .Internal is not necessary to reproduce oddity in this area. I also see things like (notice 1980)
>>> strptime(paste0(sample(1900:1999,80,replace=TRUE),"/01/01"), "%Y/%m/%d", tz="CET")
>    ...............
> 
>> The issue seems to be present in R-devel but not in (CRAN) 3.2.0
> 
> nor in R 3.2.3 (and earlier), but indeed unfortunately in 3.2.4.
> 
> This has been fixed now in  "R 3.2.4 patched"  (and R-devel of course).
> Thank you Mick, for the report...
> ...
> ...
> though I "must" add: If you do have your own tests / checks (as
> you said in the OP) and are company as big as Oracle using the
> free (in the full sense of "speech" *and* "beer") software R, 
> it would be *really* *really* courteous if you did run your test
> suite when we announce and release betas or release candidates
> ("RC") (and in the case of the upcoming yearly release in April,
> even "alphas" before them) so we, the R community and the R core
> developers could find bugs *before* release. 
> 
> Thank you -- and others, please! -- in advance for doing it next time, i.e.,
> *now*: The R web page  https://www.r-project.org/  (for a few weeks) has the news
> 
> o  R version 3.3.0 (Supposedly Educational) prerelease versions will appear starting Monday 2016-03-14. Final release is scheduled for Thursday 2016-04-14.
> 
> Martin Maechler
> ETH Zurich (and R Core team)
> 
> 
> 
>>> On 12 Mar 2016, at 17:43 , Mick Jordan <mick.jordan at oracle.com> wrote:
>>> 
>>> On 3/12/16 12:33 AM, peter dalgaard wrote:
>>>>> On 12 Mar 2016, at 00:05 , Mick Jordan <mick.jordan at oracle.com> wrote:
>>>>> 
>>>>> This is definitely obscure but we had a unit test that called .Internal(strptime, "1942/01/01", %Y/%m/%d") with timezone (TZ) set to CET.
>>>> Umm, that doesn't even parse. And fixing the typo, it doesn't run:
>>>> 
>>>>> .Internal(strptime, "1942/01/01", %Y/%m/%d")
>>>> Error: unexpected SPECIAL in ".Internal(strptime, "1942/01/01", %Y/%"
>>>>> .Internal(strptime, "1942/01/01", "%Y/%m/%d")
>>>> Error in .Internal(strptime, "1942/01/01", "%Y/%m/%d") :
>>>> 3 arguments passed to '.Internal' which requires 1
>>>> 
>>>> 
>>>> 
>>>>> In R-3.1.3 that returned "1942-01-01 CEST" which, paradoxically, is correct as they evidently did strange things in Germany during the war period. Java also returns the same. However, R-3.2.4 returns "1942-01-01 CET".
>>>> Did you mean:
>>>> 
>>>> pd$ r-release-branch/BUILD-dist/bin/R
>>>> 
>>>> R version 3.2.4 Patched (2016-03-10 r70319) -- "Very Secure Dishes"
>>>> Copyright (C) 2016 The R Foundation for Statistical Computing
>>>> Platform: x86_64-apple-darwin13.4.0/x86_64 (64-bit)
>>>> [...]
>>>>> strptime("1942/01/01", "%Y/%m/%d", tz="CET")
>>>> [1] "1942-01-01 CEST"
>>>> 
>>>> But then as you see, it does have DST on New Years Day.
>>>> 
>>>> All in all, there is something you are not telling us.
>>>> 
>>>> Notice that all DST information is OS dependent as it depends on which version of the "Olson database" is installed.
>>>> 
>>>> 
>>> You are correct that I was sloppy with syntax for the example. We are, for better or worse, calling the .Internal, but actually with a large vector of arguments, of which the 1942 entry is element 82. I can confirm that for the vector of length 1 example that I didn't test but just assumed would also fail, the answer is correct. However, it is not for the full vector:
>>> 
>>>> .Internal(strptime(argv[[1]], argv[[2]], "CET"))
>>> [1] "1937-01-01 CET" "1916-01-01 CET" "1913-01-01 CET" "1927-01-01 CET"
>>> [5] "1947-01-01 CET" "1913-01-01 CET" "1917-01-01 CET" "1923-01-01 CET"
>>> [9] "1921-01-01 CET" "1926-01-01 CET" "1920-01-01 CET" "1915-01-01 CET"
>>> [13] "1914-01-01 CET" "1914-01-01 CET" "1914-01-01 CET" "1919-01-01 CET"
>>> [17] "1948-01-01 CET" "1911-01-01 CET" "1909-01-01 CET" "1913-01-01 CET"
>>> [21] "1925-01-01 CET" "1926-01-01 CET" "1910-01-01 CET" "1917-01-01 CET"
>>> [25] "1936-01-01 CET" "1938-01-01 CET" "1960-01-01 CET" "1915-01-01 CET"
>>> [29] "1919-01-01 CET" "1924-01-01 CET" "1914-01-01 CET" "1905-01-01 CET"
>>> [33] "1921-01-01 CET" "1929-01-01 CET" "1926-01-01 CET" "1921-01-01 CET"
>>> [37] "1908-01-01 CET" "1928-01-01 CET" "1919-01-01 CET" "1921-01-01 CET"
>>> [41] "1925-01-01 CET" "1934-01-01 CET" "1927-01-01 CET" "1928-01-01 CET"
>>> [45] "1934-01-01 CET" "1922-01-01 CET" "1923-01-01 CET" "1915-01-01 CET"
>>> [49] "1934-01-01 CET" "1925-01-01 CET" "1922-01-01 CET" "1930-01-01 CET"
>>> [53] "1924-01-01 CET" "1923-01-01 CET" "1919-01-01 CET" "1932-01-01 CET"
>>> [57] "1930-01-01 CET" "1923-01-01 CET" "1930-01-01 CET" "1922-01-01 CET"
>>> [61] "1919-01-01 CET" "1932-01-01 CET" "1939-01-01 CET" "1923-01-01 CET"
>>> [65] "1920-01-01 CET" "1919-01-01 CET" "1952-01-01 CET" "1927-01-01 CET"
>>> [69] "1924-01-01 CET" "1919-01-01 CET" "1925-01-01 CET" "1945-01-01 CET"
>>> [73] "1916-01-01 CET" "1943-01-01 CET" "1920-01-01 CET" "1920-01-01 CET"
>>> [77] "1931-01-01 CET" "1924-01-01 CET" "1919-01-01 CET" "1926-01-01 CET"
>>> [81] "1920-01-01 CET" "1942-01-01 CET" "1919-01-01 CET" "1930-01-01 CET"
>>> [85] "1925-01-01 CET" "1924-01-01 CET" "1926-01-01 CET" "1918-01-01 CET"
>>> [89] "1922-01-01 CET" "1921-01-01 CET" "1925-01-01 CET" "1928-01-01 CET"
>>> [93] "1925-01-01 CET" "1929-01-01 CET" "1933-01-01 CET" "1947-01-01 CET"
>>> [97] "1950-01-01 CET" "1945-01-01 CET" "1924-01-01 CET" "1939-01-01 CET"
>>> [101] "1924-01-01 CET" "1933-01-01 CET" "1928-01-01 CET"
>>>> .Internal( strptime("1942/01/01", "%Y/%m/%d", ''))
>>> [1] "1942-01-01 CEST"
>>>>> argv[[1]][[82]]
>>> [1] "1942/01/01"
>>> 
>>> We actually pass "" as the timezone, having set TZ=CET in the shell.
>>> 
>>> I am attaching a file that defines the large vector for sourcing.
>>> 
>>> Mick
>>> 
>>> <pbug.r>
> 
>> -- 
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list