[Rd] Add-on argument in sample()

Jon Skoien jon.skoien at jrc.ec.europa.eu
Thu Jun 18 12:10:47 CEST 2015



On 6/18/2015 12:25 AM, Hervé Pagès wrote:
> Hi,
>
> Special behavior of sample(x, ...) when length(x) is 1 is of course
> a bad feature. I think it pre-dates sample.int() which is what people
> should use these days if they want the behavior of sample(x, ...) when
> length(x) is 1. And because we now have sample.int(), this feature
> could in theory be removed from sample(). Unfortunately this would
> break a lot of existing code so a warning or some kind of notification
> would need to be implemented.
>
> Even if the cost is high, that still sounds better/cleaner to me than
> adding an extra argument to sample() to control this (which is only
> going to be used by people aware of the problem but people aware of
> the problem already know how to workaround it).

I am generally skeptical to backward-incompatible changes, particularly 
when no error will be thrown. On the other hand this might be one of the 
few cases where quite a lot of existing code will suddenly work 
correctly after a change... But unfortunately it is the ones who 
actually read the documentation who will get their code broken.

Cheers,
Jon

>
> Cheers,
> H.
>
> On 06/17/2015 01:27 AM, Jon Skoien wrote:
>>
>>
>> On 6/16/2015 1:32 PM, Peter Meissner wrote:
>>> Am .06.2015, 14:55 Uhr, schrieb Millot Gael <Gael.Millot at curie.fr>:
>>>
>>>> Hi.
>>>>
>>>> I have a problem with the default behavior of sample(), which performs
>>>> sample(1:x) when x is a single value.
>>>> This behavior is well explained in ?sample.
>>>> However, this behavior is annoying when the number of value is not
>>>> predictable. Would it be possible to add an argument
>>>> that desactivates this and perform the sampling on a single value ?
>>>> Examples:
>>>>> sample(10, size = 1, replace = FALSE)
>>>> 10
>>>>
>>>>> sample(10, size = 3, replace = TRUE)
>>>> 10 10 10
>>>>
>>>>> sample(10, size = 3, replace = FALSE)
>>>> Error
>>>
>>> I think the problem here is that the function actually does what you
>>> would expect it to do given a statistic perspective. A sample of size
>>> three from a population of one without allowing to draw elements again
>>> that were drawn already is simply not defined. What shall the function
>>> give back?
>>
>>
>> If I understand right, this error is exactly what the poster would like
>> to see, but which you dont get currently. If length(population) == 1,
>> you will now sample from 1:population, not the population itself. So:
>>
>>  > sample(8:10, 3, replace = FALSE)
>> [1] 10  8  9
>>  > sample(9:10, 3, replace = FALSE)
>> Error in sample.int(length(x), size, replace, prob) :
>>    cannot take a sample larger than the population when 'replace = FALSE'
>>  > sample(10:10, 3, replace = FALSE)
>> [1]  8 10  2
>>
>> I have to admit that I also find this behaviour inconsistent, even if it
>> is well described already on the first line of the details in the
>> documentation. It is definitely a feature which can cause some trouble,
>> and where the tests might end up more complicated than you would first
>> think.
>>
>>
>>>
>>> ... You can always wrap your code in a try() like this to prevent errors
>>> to break loops or functions:
>>>
>>> try(sample(...))
>>
>> No error is given when length(population) == 1, and the result might be
>> perfectly valid if population is variable. So this will easily stay in
>> the script as an undetected bug.
>>
>>>
>>> ... or you might check your arguments before execution:
>>>
>>>
>>> if ( !replace & length(population) >= size ){
>>>    sample(population, size = size , replace = replace)
>>> }else{
>>>    ...
>>> }
>>
>> This test is not sufficient if length(population) == size == 1, so you
>> will also need to check for this special case:
>>
>> if (length(population) == 1 & size == 1) {
>>    population
>> } else if (!replace & length(population) >= size) {
>>    sample(population, size = size, replace = replace)
>> } else {
>>    ...
>> }
>>
>> Then the question would be if this test could be replaced with a new
>> argument to sample, e.g. expandSingle, which has TRUE as default for
>> backward compatibility, but FALSE if you dont want population to be
>> expanded to 1:population. It could certainly be useful in some cases,
>> but you still need to know about the expansion to use it. I think most
>> of these bugs occur because users did not think about the expansion in
>> the first place or did not realize that their population could be of
>> length 1 in some situations. These users would therefore not think about
>> changing the argument either.
>>
>> Cheers,
>> Jon
>>
>>>
>>>
>>>>
>>>> Many thanks for your help.
>>>>
>>>> Best wishes,
>>>>
>>>> Gael Millot.
>>>>
>>>>
>>>> Gael Millot
>>>> UMR 3244 (IC-CNRS-UPMC) et Universite Pierre et Marie Curie
>>>> Equipe Recombinaison et instabilite genetique
>>>> Pav Trouillet Rossignol 5eme etage
>>>> Institut Curie
>>>> 26 rue d'Ulm
>>>> 75248 Paris Cedex 05
>>>> FRANCE
>>>> tel : 33 1 56 24 66 34
>>>> fax : 33 1 56 24 66 44
>>>> Email : gael.millot at curie.fr
>>>> http://perso.curie.fr/Gael.Millot/index.html
>>>>
>>>>
>>>>     [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>>> Best, Peter
>>>
>>> --
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

-- 
Jon Olav Skøien
Joint Research Centre - European Commission
Institute for Environment and Sustainability (IES)
Climate Risk Management Unit

Via Fermi 2749, TP 100-01,  I-21027 Ispra (VA), ITALY

jon.skoien at jrc.ec.europa.eu
Tel:  +39 0332 789205

Disclaimer: Views expressed in this email are those of the individual 
and do not necessarily represent official views of the European Commission.



More information about the R-devel mailing list