[R] Odd behaviour of removing 'nothing' from an array or data frame

Duncan Murdoch murdoch at stats.uwo.ca
Tue Oct 31 16:10:37 CET 2006


On 10/31/2006 9:50 AM, Richard.Cotton at hsl.gov.uk wrote:
> Thanks for the reply Peter, though I'm not quite convinced.
> 
>> > #dubious.records = integer(0)
>> > identical(dubious.records, -dubious.records)
>> [1] TRUE
> 
>> how can peoples.heights[-dubious.records,] be different from
>> peoples.heights[dubious.records,]? 
> 
> Tell me if I'm being willfully ignorant here, but I'm sure they should be 
> different.  In the first case, the minus sign represents subtraction, so 
> it is correct that dubious.records and -dubious.records are identical.
> 
> However, in the second case, inside the square brackets, the minus sign 
> represents set complement, not subtraction, so dubious.records and -
> dubious.records are not the same.
> 
> If x = runif(10), then x[-c(2,3,5)] means "remove from x the values at the 
> second, third and fifth position".
> 
> By extension x[-integer(0)] should mean "remove from x no values", and not 
> "remove from x all values", which is the current behaviour.

As Peter said, it's the value that counts, not the way you calculated 
it.  If you print -c(2,3,5) you get three negative numbers.  If you 
print -integer(0), you don't get any.  The first case is asking for 
elements to be left out, the second isn't.

The moral of the story is to use logical indices rather than negative ones:

dubious.records <- peoples.heights$heights > 2.5
peoples.heights = peoples.heights[!dubious.records,]

works for any number of dubious records, including zero.

Duncan Murdoch



More information about the R-help mailing list