[R] work with a subset of the dataset

Marc Schwartz marc_schwartz at me.com
Fri Jan 13 05:51:08 CET 2012


Just to be precise in language, the use of:

  age %in% 20:30

is only going to match exact integer values of age from 20 to 30:

> 20:30
 [1] 20 21 22 23 24 25 26 27 28 29 30

That is not the same as matching any value between 20 and 30 as Michael inferred and as our respective examples would do.

HTH,

Marc Schwartz

On Jan 12, 2012, at 10:32 PM, Schreiber, Stefan wrote:

> Thanks for the warning !
> 
> Better use Michael's or Marc's suggestion instead.
> 
> 
> Stefan
> 
> -----Original Message-----
> From: R. Michael Weylandt [mailto:michael.weylandt at gmail.com]
> Sent: Thu 1/12/2012 9:05 PM
> To: Schreiber, Stefan
> Cc: Marc Schwartz; r-help at r-project.org; manu79
> Subject: Re: [R] work with a subset of the dataset
> 
> Be careful: I think that's only going to check exact equality: i.e.,
> it won't find 20.5, but it also won't find 19.9999999999997 which you
> might get when you mean 20 due to floating point error. If the OP has
> non-integer data, this will cause trouble.
> 
> Michael
> 
> PS -- you also don't need the call to `c` -- there's nothing the 20:30
> sequence is being combined with.
> 
> On Thu, Jan 12, 2012 at 10:50 PM, Schreiber, Stefan
> <Stefan.Schreiber at ales.ualberta.ca> wrote:
>> Or with what I just learned:
>> 
>>  subset<-[mydata$age %in% c(20:30),]
>> 
>> Thanks for explaining Michael!
>> 
>> Stefan
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: r-help-bounces at r-project.org on behalf of Marc Schwartz
>> Sent: Thu 1/12/2012 8:38 PM
>> To: R. Michael Weylandt
>> Cc: r-help at r-project.org; manu79
>> Subject: Re: [R] work with a subset of the dataset
>> 
>> Presuming that 'DF' is the data frame, I am not sure what is wrong with
>> 
>>   NewDF <- subset(DF, (age >= 20) & (age <= 30))
>> 
>> presuming that 20 and 30 are to be included.
>> 
>> ?
>> 
>> Marc Schwartz
>> 
>> On Jan 12, 2012, at 9:26 PM, R. Michael Weylandt wrote:
>> 
>>> You can probably do it more easily with the subset() function but in
>>> my experience that often leads to more problems than solutions:
>>> perhaps try this.
>>> 
>>> idx <- with(DATA, which(age > 20 & age < 30))
>>> DATA[idx, ]
>>> 
>>> Michael
>>> 
>>> On Thu, Jan 12, 2012 at 5:25 PM, manu79 <manuelespino79 at hotmail.it> wrote:
>>>> Hello,
>>>> I have a big dataset with many variables and I would like to consider
>>>> only
>>>> the rows in which there is a specific value of a variable.
>>>> 
>>>> I make an example for explain what I mean:
>>>> I have 5 variables describing a person: age, sex, weight, colour of hair,
>>>> colour of eyes.
>>>> I have 1000 rows (1000 persons) and I want to consider only the persons
>>>> whose age is between 20 to 30. How can I do?
>>>> 
>>>> Thank you very much
>>>> M.



More information about the R-help mailing list