[Rd] Posting Guide

Gabor Grothendieck ggrothendieck at gmail.com
Sat Jun 7 19:24:27 CEST 2008


Here is another update.  I have added the following:

- info about using a fresh R session.  (In that case ls() output is less
  essential; however, the developers of sessionInfo() might consider
  adding that as a default or as an option.)

- questioner should consider use of functions.

- for data use dump(x, file = "") to reproducibly display data or use
  builtin datasets listed by data()

- minimal versions of slow code should be presented in cases where
  questioner is looking for faster code.

- we still need to add links to illustrative sample questions in r-help

The following were not added for the reason cited:

- guide is not just for questioners.  Important to distinguish roles
  of questioner, responder and reader.

- what is to be provided ought to be given a name to make it easier
  to refer to.  An unlabelled set of points is too vague.  Test
  framework seems appropriately descriptive.  By giving it a name
  one can request that a questioner "provide a test framework as
  defined in the posting guide summary".

- self contained is not implied by reproducible.  Reproducible
  only means that info is available somewhere -- not that its all
  available right in the questioner's post and all in a manner that
  is readily accessible.

- focus should be on making data minimal.  Don't like attachments
  since responder must save them and read them in.  It encourages
  use of large rather than minimal data sets.

Summary

Surprisingly, the main problem for responders is not to answer the
question but to quickly figure out what the question is, reproduce it
in their own R session and test their answer.

Test Framework.  To faciliate this provide a test framework of:

  (1) minimal reproducible self-contained commented code and data
      that has been run in a fresh R session.  That means code and
      data have been cut down as far as possible to the essentials
      needed to illustrate the problem and were run are just after
	  starting up R.  Also it means that its possible for responders
      to just copy the code and data section from the questioner's
      post to the clipboard and paste it into their session to see
      the same output without having to enter even one R command.
      In some cases there may be an advantage to present the code as
      a function and in the case of needing a speedup be sure to post
      a minimal version of the slow code.  Use builtin data sets such
      as those listed by data() to illustrate problem or reduce your
      data to a minimum and present it reproducibly by using:
         dump("mydata", file = "")

  (2) comments/explanation of what the code is intended to produce
	  -- Don't assume its obvious!

  (3) versions of all software used, e.g. sessionInfo(),
	  or R.version.string; packageDescription("zoo")$Version

Without self-contained reproducible code the responder must not only
understand the question but must also create a test framework and that
typically takes more time than answering the question!  Its not fair
to ask the responder to provide all that on top of answering the
question.  Do NOT assume the problem is so simple that it is not
necessary.

Effort. The effort taken to reduce the problem to its essentials and
produce a test framework often solves the problem avoiding the need
for a post in the first place.  It at the least shows that the
questioner tried to solve it themself.

Subscribers.  The questioner should ensure that the thread is complete
and that it has an appropriate Subject.  The purpose of the post is
not only to help the questioner but also the other list subscribers
and those later searching the archives.





On Sat, Jun 7, 2008 at 9:38 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Here is a second version of the summary.  Its been rearranged to
> place most important info at top.  Also shortened it a bit.
>
> It still needs links to example posts, as suggested.  Anyone?
>
> Summary
>
> Surprisingly, the main problem for responders is not to answer the
> posted questions but to quickly figure out what the question is, reproduce
> it in their own R session and test their answer.
>
> Test Framework.  To faciliate that provide a test framework of:
>
>  (1) reproducible self-contained minimal code and data.  That means
>      responders can copy it from the questioner's post and paste it
>      into their session to see the same output without having to
>      enter even one R command.
>      NB. dput(mydata) produces mydata in reproducible form.
>  (2) comments/explanations of what the code is intended to produce and
>  (3) versions of all software used, e.g. sessionInfo().
>
> Without self-contained reproducible code the responder must not only
> understand the question but must also create a test framework and that
> typically takes more time than answering the question!  Its not fair
> to ask the responder to provide all that on top of answering the
> question.  Do NOT assume the problem is so simple that it is not
> necessary.
>
> Effort. The effort taken to reduce the problem to its essentials and
> produce a test framework often solves the problem avoiding the need
> for a post in the first place.  It at the least shows that the
> questioner tried to solve it themself.
>
> Subscribers.  The questioner should ensure that the thread is complete
> and that it has an appropriate Subject.  The purpose of the post is
> not only to help the questioner but also the other list subscribers
> and those later searching the archives.
>
>
>
> On Fri, Jun 6, 2008 at 1:30 PM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>> People read the posting guide yet they are still unable to create an acceptable
>> post. e.g.
>> https://stat.ethz.ch/pipermail/r-help/2008-June/164092.html
>>
>> I think the problem is that the guide is not clear or concise enough.
>> I suggest we add a summary at the beginning which gets to the heart
>> of what a poster is expected to provide:
>>
>> Summary
>>
>> To maximize your change of getting a response when posting provide (1)
>> commented,
>> (2) minimal, (3) self-contained and (4) reproducible code.  (This one
>> line summary
>> also appears at the end of each message to r-help.)
>>
>> "Self-contained" and "reproducible" mean that a responder can copy the
>> questioner's code to
>> the clipboard, paste it into their R session and see the same problem
>> you as the questioner
>> see.  Note that dput(mydata) will display mydata in a reproducible way.
>> Self-contained and reproducible are needed because:
>> (1) Self-Effort. It shows that the questioner tried to solve the
>> problem by themself first.
>> (2) Test framework. Often the responder needs to play with the code a
>> bit in order to respond
>> or at least to give the best answer.  They can't do that without a
>> test framework that includes
>> the data and the code to run it and its not fair to ask them to not
>> only answer the question but
>> also to come up with test data and to complete incomplete code.
>> (3) Archives. Questions and answers go into the archives so they are
>> not only for the benefit of
>> of the questioner but also for the benefit of all future searchers of
>> the archive.  That means
>> that its not finished if you have solved the problem for yourself.
>> You still need to ensure that
>> the thread has a complete solution. (For that reason its also
>> important to give a meaningful
>> subject to each post.)
>>
>> "Commented" and "minimal" also reduce the time it takes to understand
>> the problem.
>> Don't just dump your code as is into the message since you are just
>> wasting your own
>> time. Its not likely anyone will answer a message if the questioner
>> has not taken the
>> time to reduce it to its essential elements.  Surprisingly, quite
>> often understanding what
>> the problem is takes the responder most of the time -- not solving the
>> problem. Once the
>> question is actually understood its often quite fast to answer.  Thus
>> in addition to posting
>> it in a minimal form, comment on it sufficiently so that the responder
>> knows what the code
>> does and is intended to produce.  It may be obvious to the questioner
>> who is embroiled in
>> the problem but that does not mean its obvious to others.
>>
>> Introduction
>>
>> .... rest of posting guide ...
>>
>



More information about the R-devel mailing list