[R] subsetting data-frame by vector of characters

Peter Dalgaard P.Dalgaard at biostat.ku.dk
Fri Jun 13 16:51:21 CEST 2008


james perkins wrote:
> Thanks a lot for that. Its the %in% I needed to work out mainly
>
> large didn't mean anything in particular, just that it gets quite long
> with the real data.
> I did mean: names = c("John", "Phil", "Robert")
>
> The only problem is that using the method you suggest is that I lose
> the indexing, ie in the example, instead of:
>
> (index)    Name    Fave.Number
> 1    John    7
> 2    Phil    14
> 3    Robert    23
>
>
> I end up with
>
>
> (index) Name Fave.Number
> 1     John     7
> 3     Phil     14
> 5     Robert 23
>
> This isnt a problem at the moment but I guess it could be if I used
> the table later in loops. Is there an easy way to re-index the table?
>
Notice that these are names, not numbers:  result[2,1] is "Phil" in both
cases. If it bothers you, just set rownames(result) <- NULL

(BTW, are your names unique? in that case you could set them as rownames
and use them for indexing:

rownames(names.and.numbers) <- names.and.numbers$Name
names.and.numbers[names, ]

> Kind regards
>
> Jim
>
> Wacek Kusnierczyk wrote:
>> james perkins wrote:
>>  
>>> Hi,
>>>
>>> I have a very simple problem but I can't think how to solve it without
>>> using a for loop and creating a large logical vector. However given
>>> the nature of the problem I am sure there is a "1-liner" that could do
>>> the same thing much more efficiently.
>>>
>>> bascially I have a dataframe with characters in, eg
>>>
>>>    
>>>> names.and.numbers
>>>>       
>>> (index)    Name    Fave.Number
>>> 1    John    7
>>> 2    Tony    12
>>> 3    Phil    14
>>> 4    Adam    22
>>> 5    Robert    23
>>>
>>>
>>> Now, imagine I have a vector of names, ie:
>>>
>>>    
>>>> names = c("John,Phil,Robert")
>>>>       
>>
>> this is a one-element vector of string(s) that are concatenated names
>> (strings with names).
>> or you mean:  names = c("John", "Phil", "Robert")
>>
>>
>>  
>>> All I want to do is get the subset of the dataframe which corresponds
>>> to the names in the vector "Names". IE
>>>
>>> (index)    Name    Fave.Number
>>> 1    John    7
>>> 2    Phil    14
>>> 3    Robert    23
>>>     
>>
>> this should do:
>> names.and.numbers[names.and.numbers$Name %in% names,]
>>
>> if names is as you say above, do
>> names.and.numbers[names.and.numbers$Name %in% strsplit(names,","), ]
>>
>> you do create a logical vector here (what does 'large' mean?), but no
>> loop is involved at the surface.
>>
>> vQ
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907



More information about the R-help mailing list