[R] Randomly remove condition-selected rows from a matrix

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Fri Jan 2 20:18:33 CET 2009


xxx wrote:
> On Fri, Jan 2, 2009 at 10:07 AM, Wacek Kusnierczyk
> <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
>   
>> ...     'sample' takes a sample of the specified size from the elements of
>>     'x' using either with or without replacement.
>>
>>       x: Either a (numeric, complex, character or logical) vector of
>>          more than one element from which to choose, or a positive
>>          integer.
>>
>>    If 'x' has length 1, is numeric (in the sense of 'is.numeric') and
>>     'x >= 1', sampling takes place from '1:x'.  _Note_ that this
>>     convenience feature may lead to undesired behaviour when 'x' is of
>>     varying length 'sample(x)'.  See the 'resample()' example below.
>> ...
>> yet the following works, even though x has length 1 and is *not* numeric:...
>> is this a bug in the code, or a bug in the documentation?
>>     
>
> I would guess it's a bug in the documentation.
>
>  

possibly.  looking at the r code for sample, it's clear why
sample("foo") works:

function (x, size, replace = FALSE, prob = NULL)
{
    if (length(x) == 1 && is.numeric(x) && x >= 1) {
        if (missing(size))
            size <- x
        .Internal(sample(x, size, replace, prob))
    }
    else {
        if (missing(size))
            size <- length(x)
        x[.Internal(sample(length(x), size, replace, prob))]
    }
}

what is also clear from the code is that the function has another,
supposedly buggy behaviour due to the smart behaviour of the : operator:

sample(1.1)
# 1, not 1.1

this is consistent with

"
     If 'x' has length 1, is numeric (in the sense of 'is.numeric') and
     'x >= 1', sampling takes place from '1:x'.
"

due to the downcast performed by the colon operator, but not with

"
       x: Either a (numeric, complex, character or logical) vector of
          more than one element from which to choose, or a positive
          integer.
"

both from ?sample.  tfm is seemingly wrong wrt. the implementation, and
i find sample(1.1) returning 1 a design flaw.  (i guess the note "_Note_
that this convenience feature may lead to undesired behaviour when 'x'
is of varying length 'sample(x)'." is supposed to explain away such cases.)

vQ




More information about the R-help mailing list