[R] selections of data by one variable

Marc Schwartz MSchwartz at MedAnalytics.com
Wed May 4 14:56:01 CEST 2005


On Wed, 2005-05-04 at 11:14 +0000, Tu Yu-Kang wrote:
> Dear R experts,
> 
> My problem is as follows:
> 
> Suppose I have a data frame d comprising two variable a<-c(1:10) & 
> b<-c(11:20).
> 
> I now want to select a subgroup according the values of b.
> 
> I know if I just want to select, say, b=17, I can use f<-d[d$b==17] and R 
> will give me 
> 
> > f
>   a  b
> 7 7 17
> 
> However, if now I want to select a subgroup according to b==e<-c(13,15,17), 
> then the same syntx doesn't work.
> 
> What is the correct way to do it?  My data have more than one million 
> subjects, and I want to select part of them according to their id numbers.
> 
> Your help will be highly appreciated.
> 
> Best regards,
> 
> Yu-Kang

You would want to use something like the following:

> df <- data.frame(a = 1:10, b = 11:20)

> df
    a  b
1   1 11
2   2 12
3   3 13
4   4 14
5   5 15
6   6 16
7   7 17
8   8 18
9   9 19
10 10 20

> df[df$b %in% c(13, 15, 17), ]
  a  b
3 3 13
5 5 15
7 7 17


See ?"%in%" for more information.

Also, see ?subset for more flexibility in using complex boolean
expressions for subsetting.

HTH,

Marc Schwartz




More information about the R-help mailing list