[Rd] sample (PR#1212)

maechler@stat.math.ethz.ch maechler@stat.math.ethz.ch
Fri, 14 Dec 2001 09:30:58 +0100 (MET)


>>>>> "possolo" == possolo  <possolo@crd.ge.com> writes:

    possolo> Full_Name: Antonio Possolo Version: 1.3.1 OS: Linux
    possolo> (RH 7.1), Windows 2000 Submission from: (NULL)


    possolo> A FEATURE THAT EASILY GENERATES BUGS

    possolo> sample(pi, size=1) produces 1, 2, or 3.
    possolo> sample(c(pi, pi), size=1) produces 3.141593 always.

    possolo> Although this conforms with the behavior explained
    possolo> in the help page for "sample", the behavior for the
    possolo> case where x (in sample(x, ...)) has length 1 can
    possolo> easily lead to errors if x is generated
    possolo> automatically and one neglects to check its length
    possolo> before sampling from it.

I completely share your opinion; and we (not I) only recently
had a case where a user-written function did not work in some
cases when     sample(iv, ...)   was used, with an integer 
vector iv which sometimes (rarely) was of length one.
In your case with `pi' (which is not an integer), we could make
sample() give a warning at least instead of silently coercing to
integer; in our case however, iv[] *is* an integer vector, just
sometimes of length 1 which is inherently ``non-decidable''.

The reason sample() works as it does is S - compatibility
and that *is* important.

    possolo> I believe it would be safest to require x to be
    possolo> always the full set of values one wishes to sample
    possolo> from, and remove the special meaning that is
    possolo> attached to the case when x is of length 1.

What we *could* consider instead,  without breaking back-compatibility,
is adding an additional argument `isVector', e.g.

   sample (x, size, replace = FALSE, prob = NULL, isVector = FALSE) 

such that if you use

     sample(pi, size = 1, isV = TRUE)

you would always get 3.14159..
This would add least make the user written code much nicer to
read than (what is currently needed)

     i.rand <- if(length(iv) == 1) iv  else  sample(iv, .....)


Other opinions?  {from Insightful as well -- this would be worth
		 doing in all implementations of S}

Martin Maechler <maechler@stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._