[R] Randomly remove condition-selected rows from a matrix

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Fri Jan 2 16:07:01 CET 2009


Stavros Macrakis wrote:
> On Wed, Dec 31, 2008 at 12:44 PM, Guillaume Chapron
> <carnivorescience at gmail.com> wrote:
>   
>>> m[-sample(which(m[,1]<8 & m[,2]>12),2),]
>>>       
>> Supposing I sample only one row among the ones matching my criteria. Then
>> consider the case where there is just one row matching this criteria. Sure,
>> there is no need to sample, but the instruction would still be executed.
>> Then if this row index is 15, my instruction becomes which(15,1), and this
>> can gives me any row from 1 to 15, which is not correct. I have to make a
>> condition in case there is only one row matching the criteria.
>>     
>
> Yes, this is a (documented!) design flaw in 'sample' -- see the man page.
>
> For some reason, the designers of R have chosen to document the flaw
> and leave it up to individual users to work around it rather than fix
> it definitively.  A related case is sample(c(),0), which gives an
> error rather than giving an empty vector, though in general R deals
> with empty vectors correctly (e.g. sum(c()) => 0).
>
>   

interestingly, ?sample says:

"
     'sample' takes a sample of the specified size from the elements of
     'x' using either with or without replacement.

       x: Either a (numeric, complex, character or logical) vector of
          more than one element from which to choose, or a positive
          integer.

    If 'x' has length 1, is numeric (in the sense of 'is.numeric') and
     'x >= 1', sampling takes place from '1:x'.  _Note_ that this
     convenience feature may lead to undesired behaviour when 'x' is of
     varying length 'sample(x)'.  See the 'resample()' example below.

"

yet the following works, even though x has length 1 and is *not* numeric:

x = "foolme"
is.numeric(x)
sample(x, 1)
sample(x)

x = NA
is.numeric(NA)
sample(x, 1)
sample(x)

is this a bug in the code, or a bug in the documentation?



> To my mind, it is bizarre to have an important basic function which
> works for some argument lengths but not others.  The convenience of
> being able to write sample(5,2) for sample(1:5,2) hardly seems worth
> inflicting inconsistency on all users -- but perhaps one of the
> designers of R/S can enlighten us on the design rationale here.
>
>   

hopefully.

vQ




More information about the R-help mailing list