[R] how to delete specific rows in a data frame where the first column matches any string from a list

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Fri Feb 6 21:50:45 CET 2009


Laura Rodriguez Murillo wrote:
> Thank you. I think grep would do it, but the list of expressions I
> need to match is too long so they are stored in a file. 

what does 'too long' mean?

> So the
> question would be how I can tell R to look into that file to look for
> the expressions that I want to match.
>   

i guess you may still successfully use r for this, but to me it sounds
like a perfect job for perl.  let me know if you need more help. 

note, in the below, you'd use 'data[,2]' instead of 'd[,2]' (or 'd'
instead of 'data').  sorry for the typo.  mark, thanks for pointing this
out -- the more obvious the mistake, the less visible ;)

vQ


> Thank you again for your help
>
> Laura
>
> 2009/2/6 Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no>:
>   
>> Laura Rodriguez Murillo wrote:
>>     
>>> Hi,
>>>
>>> I'm new in the mailing list but I would appreciate if you could help
>>> me with this:
>>> I have a big matrix from where I need to delete specific rows. The
>>> second entry on these rows to delete should match any string within a
>>> list (other file with just one column).
>>> Thank you so much!
>>>
>>>
>>>       
>> here's one way to do it, illustrated with dummy data:
>>
>> # dummy character matrix
>> data = matrix(replicate(20, paste(sample(letters, 20), collapse="")),
>> ncol=2)
>>
>> # filter out rows where second column does not match 'a'
>> data[-grep('a', d[,2]),]
>>
>> this will work also if your data is actually a data frame:
>>
>> data = as.data.frame(data)
>> data[-grep('a', d[,2]),]
>>
>> note, due to a known issue with grep, this won't work correctly if there
>> are *no* rows that do *not* match the pattern:
>>
>> data[-grep('1', d[,2]),]
>> # should return all of data, but returns an empty matrix
>>
>> with the upcoming version of r, grep will have an additional argument
>> which will make this problem easy to fix:
>>
>> data[grep('a', d[,2], invert=TRUE),]
>>
>>
>> vQ




More information about the R-help mailing list