[Rd] Using sample() to sample one value from a single value?

Henrik Bengtsson hb at biostat.ucsf.edu
Wed Nov 3 19:19:47 CET 2010


On Wed, Nov 3, 2010 at 11:07 AM, Henrik Bengtsson <hb at biostat.ucsf.edu> wrote:
> On Wed, Nov 3, 2010 at 11:02 AM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
>> The resample function in the example section from sample help page does it
>> or not?
>
> Yes, I just noticed that one [at the very end of the example in
> help("sample")].  So, maybe resample() should be a function available
> in R?

So for completeness, this has also be discussed in R-devel thread
'[patch] add is.set parameter to sample()' started on 2010-03-23, cf.
http://www.mail-archive.com/r-devel@r-project.org/msg19998.html.

/Henrik

>
> /Henrik
>
>>
>> On Wed, Nov 3, 2010 at 3:54 PM, Henrik Bengtsson <hb at biostat.ucsf.edu>wrote:
>>
>>> Hi, consider this one as an FYI, or a seed for further discussion.
>>>
>>> I am aware that many traps on sample() have been reported over the
>>> years.  I know that these are also documents in help("sample").  Still
>>> I got bitten by this while writing
>>>
>>> sample(units, size=length(units));
>>>
>>> where 'units' is an index (positive integer) vector.  It works in all
>>> cases as expected (=I expect) expect for length(units) == 1.  I know,
>>> it is well known.  However, it got to make me wonder if it is possible
>>> to use sample() to draw a single value from a set containing only one
>>> value.  I don't think so, unless you draw from a value that is <= 1.
>>>
>>> For instance, you can sample from c(10,10) by doing:
>>>
>>> > sample(rep(10, times=2), size=2);
>>> [1] 10 10
>>>
>>> but you cannot sample from c(10) by doing:
>>>
>>> > sample(rep(10, times=1), size=1);
>>> [1] 9
>>>
>>> unless you sample from a value <= 1, e.g.
>>>
>>> sample(rep(0.31, times=1), size=1);
>>> [1] 0.31
>>>
>>> sample(rep(-10, times=1), size=1);
>>> [1] -10
>>>
>>> Note also the related issue of sampling from a double vector of length 1,
>>> e.g.
>>>
>>> > sample(rep(1.2, times=2), size=2);
>>> [1] 1.2 1.2
>>> > sample(rep(1.2, times=1), size=1);
>>> [1] 1
>>>
>>> I the latter case 1.2 is coerced to an integer.
>>>
>>> All of the above makes sense when one study the code of sample(), but
>>> sample() is indeed dangerous, e.g. imagine how many bootstrap
>>> estimates out there quietly gets incorrect.
>>>
>>>
>>> In order to cover all cases of length(units), including one, a solution is:
>>>
>>> sampleFrom <- function(x, size=length(x), ...) {
>>>  n <- length(x);
>>>  if (n == 1L) {
>>>    res <- x;
>>>  } else {
>>>    res <- sample(x, size=size, ...);
>>>  }
>>>  res;
>>> } # sampleFrom()
>>>
>>> > sampleFrom(rep(10, times=2), size=2);
>>> [1] 10 10
>>>
>>> > sampleFrom(rep(10, times=1), size=1);
>>> [1] 10
>>>
>>> > sampleFrom(rep(0.31, times=1), size=1);
>>> [1] 0.31
>>>
>>> > sampleFrom(rep(-10, times=1), size=1);
>>> [1] -10
>>>
>>> > sampleFrom(rep(1.2, times=2), size=2);
>>> [1] 1.2 1.2
>>>
>>> > sampleFrom(rep(1.2, times=1), size=1);
>>> [1] 1.2
>>>
>>>
>>> I want to add sampleFrom() to the wishlist of functions to be
>>> available in default R.  Alternatively, one can add an argument
>>> 'sampleFrom=FALSE' to the existing sample() function.  Eventually such
>>> an argument can be made TRUE by default.
>>>
>>> /Henrik
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>>
>>
>> --
>> Henrique Dallazuanna
>> Curitiba-Paraná-Brasil
>> 25° 25' 40" S 49° 16' 22" O
>>
>>        [[alternative HTML version deleted]]
>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>



More information about the R-devel mailing list