[R] Help searching a matrix for only certain records

Matt Borkowski mathias1979 at yahoo.com
Sun Mar 3 15:22:44 CET 2013


Thank you for your response Jim! I will give this one a try! But a couple followup questions...

In my search for a solution, I had seen something stating match() is much more efficient than subset() and will cut down significantly on computing time. Is there any truth to that?

Also, I found the following solution which works for matching a single condition, but I couldn't quite figure out how to  modify it it to search for both my acceptable conditions...

> testdata <- testdata[testdata$REC.TYPE == "SAO",,drop=FALSE]

-Matt




--- On Sun, 3/3/13, jim holtman <jholtman at gmail.com> wrote:

From: jim holtman <jholtman at gmail.com>
Subject: Re: [R] Help searching a matrix for only certain records
To: "Matt Borkowski" <mathias1979 at yahoo.com>
Cc: r-help at r-project.org
Date: Sunday, March 3, 2013, 8:00 AM

Try this:

dataset <- subset(dataset, grepl("(SAO |FL-15)", REC.TYPE))


On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski <mathias1979 at yahoo.com> wrote:
> Let me start by saying I am rather new to R and generally consider myself to be a novice programmer...so don't assume I know what I'm doing :)
>
> I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year dataset of 15-minute data. However, I only need the rows where the column I've named REC.TYPE contains the string "SAO  " or "FL-15".
>
> My horribly inefficient solution was to search the matrix row by row, test the REC.TYPE column and essentially delete the row if it did not match my criteria. Essentially...
>
>> j <- 1
>> for (i in 1:nrow(dataset)) {
>>    if(dataset$REC.TYPE[j] != "SAO  " && dataset$RECTYPE[j] != "FL-15") {
>>      dataset <- dataset[-j,]  }
>>    else {
>>      j <- j+1  }
>> }
>
> After watching my code get through only about 10% of the matrix in an hour and slowing with every row...I figure there must be a more efficient way of pulling out only the records I need...especially when I need to repeat this for another 8 datasets.
>
> Can anyone point me in the right direction?
>
> Thanks!
>
> Matt
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.




More information about the R-help mailing list