[R] Unit Testing Frameworks: summary and brief discussion

Thu May 10 03:35:15 CEST 2007

Hi

Paul Gilbert wrote:
> Tony
> 
> Thanks for the summary.
> 
> My ad hoc system is pretty good for catching flagged errors, and 
> numerical errors when I have a check.  Could you (or someone else) 
> comment on how easy it would be with one of these more formal frameworks 
> to do three things I have not been able to accomplish easily:
> 
> - My code gives error and warning messages in some situations. I want to 
> test that the errors and warnings work, but these flags are the correct 
> response to the test. In fact, it is an error if I don't get the flag. 
> How easy is it to set up automatic tests to check warning and error 
> messages work?
> 
> - For some things it is the printed format that matters. How easy is it 
> to set up a test of the printed output? (Something like the Rout files 
> used in R CMD check.) I think this is what Tony Plate is calling 
> transcript file tests, and I guess it is not automatically available. I 
> am not really interested in something I would have to change with each 
> new release of R, and I need it to work cross-platform. I want to know 
> when something has changed, in R or my own code, without having to 
> examine the output carefully.
> 
> - (And now the hard one.) For some things it is the plotted output that 
> matters. Is it possible to set up automatic tests of plotting? I can 
> already test that plots run. I want to know if they "look very 
> different". And no, I don't have a clue where to start on this one.

For text-based graphics formats, you can just use diff;  for raster
formats, you can do per pixel comparisons.  These days there is
ImageMagick to do a compare and it will even produce an image of the
difference.  I have an old package called graphicsQC (not on CRAN) that
implemented some of these ideas (there was a talk at DSC 2003, see
http://www.stat.auckland.ac.nz/~paul/index.html).  A student worked on a
much better approach more recently, but I haven't put that up on the web
yet.  Let me know if you'd like to take a look at the newer package (it
would help to have somebody nagging me to get it finished off).

Paul

> Paul Gilbert
> 
> anthony.rossini at novartis.com wrote:
>> Greetings -
>>
>> I'm finally finished review, here's what I heard:
>>
>> ============ from Tobias Verbeke:
>>
>> anthony.rossini at novartis.com wrote:
>>> Greetings!
>>>
>>> After a quick look at current programming tools, especially with regards 
>>> to unit-testing frameworks, I've started looking at both "butler" and 
>>> "RUnit".   I would be grateful to receieve real world development 
>>> experience and opinions with either/both.    Please send to me directly 
>>> (yes, this IS my work email), I will summarize (named or anonymous, as 
>>> contributers desire) to the list.
>>>
>> I'm founding member of an R Competence Center at an international 
>> consulting company delivering R services
>> mainly to the financial and pharmaceutical industries. Unit testing is 
>> central to our development methodology
>> and we've been systematically using RUnit with great satisfaction, 
>> mainly because of its simplicity. The
>> presentation of test reports is basic, though. Experiences concerning 
>> interaction with the RUnit developers
>> are very positive: gentle and responsive people.
>>
>> We've never used butler. I think it is not actively developed (even if 
>> the developer is very active).
>>
>> It should be said that many of our developers (including myself) have 
>> backgrounds in statistics (more than in cs
>> or software engineering) and are not always acquainted with the 
>> functionality in other unit testing frameworks
>> and the way they integrate in IDEs as is common in these other languages.
>>
>> I'll soon be personally working with a JUnit guru and will take the 
>> opportunity to benchmark RUnit/ESS/emacs against
>> his toolkit (Eclipse with JUnit- and other plugins, working `in perfect 
>> harmony' (his words)). Even if in my opinion the
>> philosophy of test-driven development is much more important than the 
>> tools used, it is useful to question them from
>> time to time and your message reminded me of this... I'll keep you 
>> posted if it interests you. Why not work out an
>> evaluation grid / check list for unit testing frameworks ?
>>
>> Totally unrelated to the former, it might be interesting to ask oneself 
>> how ESS could be extended to ease unit testing:
>> after refactoring a function some M-x ess-unit-test-function 
>> automagically launches the unit-test for this particular
>> function (based on the test function naming scheme), opens a *test 
>> report* buffer etc.
>>
>> Kind regards,
>> Tobias
>>
>> ============ from Tony Plate:
>>
>> Hi, I've been looking at testing frameworks for R too, so I'm interested 
>> to hear of your experiences & perspective.
>>
>> Here's my own experiences & perspective:
>> The requirements are:
>>
>> (1) it should be very easy to construct and maintain tests
>> (2) it should be easy to run tests, both automatically and manually
>> (3) it should be simple to look at test results and know what went wrong 
>> where
>>
>> I've been using a homegrown testing framework for S-PLUS that is loosely 
>> based on the R transcript style tests (run *.R and compare output with 
>> *.Rout.save in 'tests' dir).  There are two differences between this 
>> test framework and the standard R one:
>> (1) the output to match and the input commands are generated from an 
>> annotated transcript (annotations can switch some tests in or out 
>> depending on the version used)
>> (2) annotations can include text substitutions (regular expression 
>> style) to be made on the output before attempting to match (this helps 
>> make it easier to construct tests that will match across different 
>> versions that might have minor cosmetic differences in how output is 
>> formatted).
>>
>> We use this test framework for both unit-style tests and system testing 
>> (where multiple libraries interact and also call the database).
>> One very nice aspect of this framework is that it is easy to construct 
>> tests -- just cut and paste from a command window.  Many tests can be 
>> generated very quickly this way (my impression is that is is much much 
>> faster to build tests by cutting and pasting transcripts from a command 
>> window than it is to build tests that use functions like all.equal() to 
>> compare data structures.) It is also easy to maintain tests in the face 
>> of change (e.g., with a new version of S-PLUS or with bug fixes to 
>> functions or with changed database contents) -- I use ediff in emacs to 
>> compare test output with the stored annotated transcript and can usually 
>> just use ediff commands to update the transcript.
>>
>> This has worked well for us and now we are looking at porting some code 
>> to R.  I've not seen anything that offers these conveniences in R.
>>
>> It wouldn't be too difficult to add these features to the built-in R 
>> testing framework, but I've not had success in getting anyone in R core 
>> to listen to even consider changes, so I've not pursued that route after 
>> an initial offer of some simple patches to tests.mk and wintests.mk.
>>
>> RUnit doesn't have transcript-style tests, but it wasn't very difficult 
>> to add support for transcript-style tests to it.  I'll probably go ahead 
>> and use some version of that for our porting project.  (And offer it to 
>> the community if the RUnit maintainers want to incorporate it.)  I also 
>> like the idea that RUnit has some code analysis tools -- that might 
>> support some future project that allowed one to catalogue the number of 
>> times each code path through a function was exercised by the tests.
>>
>> I just looked at 'butler' and it looks very much like RUnit to me -- and 
>> I didn't see any overview that explained differences.  Do you know of 
>> any differences?
>>
>> cheers,
>>
>> Tony Plate
>>
>>
>> ============== from Paul Gilbert:
>>
>> Tony
>>
>> While this is not exactly your question, I have been using my own system 
>>   based on make and the tools use by R CMD build/check to do something I 
>> think of as unit testing. This pre-dates the unit-testing frameworks, in 
>> fact, some of it predates R. I actually wrote something on this at one 
>> point: Paul Gilbert. R package maintenance. R News, 4(2):21-24, 
>> September 2004.
>>
>> I have occasionally thought about trying to use RUnit, but never done 
>> much because I am relatively happy with what I have. (Inertia is an 
>> issue too.)  I would be happy to hear if you do an assessment of the 
>> various tools.
>>
>> Best,
>> Paul Gilbert
>>
>>
>> ============= From Seth Falcon:
>>
>> Hi Tony,
>>
>> anthony.rossini at novartis.com writes:
>>> After a quick look at current programming tools, especially with regards 
>>> to unit-testing frameworks, I've started looking at both "butler" and 
>>> "RUnit".   I would be grateful to receieve real world development 
>>> experience and opinions with either/both.    Please send to me directly 
>>> (yes, this IS my work email), I will summarize (named or anonymous, as 
>>> contributers desire) to the list.
>> I've been using RUnit and have been quite happy with it.  I had not
>> heard of butler until I read your mail (!).
>>
>> RUnit behaves reasonably similarly to other *Unit frameworks and this
>> made it easy to get started with as I have used both JUnit and PyUnit
>> (unittest module).
>>
>> Two things to be wary of:
>>
>>   1. At last check, you cannot create classes in unit test code and
>>      this makes it difficult to test some types of functionality.  I'm
>>      really not sure to what extent this is RUnit's fault as opposed
>>      to limitation of the S4 implemenation in R.
>>
>>   2. They have chosen a non-default RNG, but recent versions provide a
>>      way to override this.  This provided for some difficult bug
>>      hunting when unit tests behaved differently than hand-run code
>>      even with set.seed().
>>
>> The maintainer has been receptive to feedback and patches.  You can
>> look at the not-so-beautiful scripts and such we are using if you look
>> at inst/UnitTest in: Category, GOstats, Biobase, graph
>>
>> Best Wishes,
>>
>> + seth
>>
>>
>> =================== Discussion:
>>
>> After a bit more cursory use, it looks like RUnit is probably the right 
>> approach at this time (sorry Hadley!).   Both RUnit and butler have a 
>> range of testing facilities and programming support tools.   I support the 
>> above statements about feasibility and problems -- except that I didn't 
>> get a chance to checkout the S4 issues that Seth raised above.    The one 
>> piece that I found missing in my version was some form of GUI based 
>> tester, i.e. push a button and test, but I think I've not thought through 
>> some of the issues with environments and closures yet that might cause 
>> problems.
>>
>> Thanks to everyone for responses!  It's clear that there is a good start 
>> here, but lots of room for improvement exists.
>>
>> Best regards / Mit freundlichen Grüssen, 
>> Anthony (Tony) Rossini
>> Novartis Pharma AG
>> MODELING & SIMULATION
>> Group Head a.i., EU Statistical Modeling
>> CHBS, WSJ-027.1.012
>> Novartis Pharma AG
>> Lichtstrasse 35
>> CH-4056 Basel
>> Switzerland
>> Phone: +41 61 324 4186
>> Fax: +41 61 324 3039
>> Cell: +41 79 367 4557
>> Email : anthony.rossini at novartis.com
>>
>> 	[[alternative HTML version deleted]]
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ====================================================================================
> 
> La version française suit le texte anglais.
> 
> ------------------------------------------------------------------------------------
> 
> This email may contain privileged and/or confidential inform...{{dropped}}
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Dr Paul Murrell
Department of Statistics
The University of Auckland
Private Bag 92019
Auckland
New Zealand
64 9 3737599 x85392
paul at stat.auckland.ac.nz
http://www.stat.auckland.ac.nz/~paul/