[R] Searching for specific values in a matrix

Steve Lianoglou mailinglist.honeypot at gmail.com
Mon Jul 27 22:00:19 CEST 2009


On Jul 27, 2009, at 3:50 PM, Mehdi Khan wrote:

> the problem is, it works with the example data i gave.  however, it  
> does NOT work with the data set i have, which is 600,000 rows.  the  
> class is still a data frame.

So the problem must be in your data, or what you think is in your  
data. Somehow you're constructing a "boolean query" that returns false  
for every row. As long as you're not getting any memory errors, the  
size of your data doesn't change the mechanics of how this would work.

I suspect you're not getting <0 rows> for every possible query you can  
come up with, right?

Look at the first 10 lines of your dataset and try to select some rows  
from your entire data.frame by using values you can see in the first  
10 rows you've just looked at.

I'm expecting this would work, in which case I'm not sure how much  
more help I can provide.

-steve


> On Mon, Jul 27, 2009 at 12:15 PM, Steve Lianoglou <mailinglist.honeypot at gmail.com 
> > wrote:
>
> On Jul 27, 2009, at 2:54 PM, Mehdi Khan wrote:
>
> i am able to return the first column, but anything else returns this:
> <0 rows> (or 0-length row.names)
>
> any idea?
>
> I'm not sure what you're doing.
>
> The result you're getting happens when no rows "pass" the logical  
> test that you are using to index the rows of your data.frame for.
>
> Can you show the code that you are using (based on the example data  
> you gave) that is giving you the <0 rows> result?
>
> -steve
>
>
>
> On Tue, Jul 21, 2009 at 12:49 PM, Steve Lianoglou <mailinglist.honeypot at gmail.com 
> > wrote:
>
> On Jul 21, 2009, at 3:27 PM, Mehdi Khan wrote:
>
> I understand your explanation about the test for even numbers.   
> However I am still a bit confused as to how to go about finding a  
> particular value.  Here is an example data set
>
> col #          attr1    attr2   attr 3    LON        LAT
> 17209         D        NA    NA -122.9409 38.27645
> 17210        BC        NA    NA -122.9581 38.36304
> 17211         B        NA    NA -123.6851 41.67121
> 17212        BC        NA    NA -123.0724 38.93073
> 17213         C        NA    NA -123.7240 41.84403
> 17214      <NA>       464    NA -122.9430 38.30988
> 17215         C        NA    NA -123.4442 40.65369
> 17216        BC        NA    NA -122.9389 38.31551
> 17217         C        NA    NA -123.0747 38.97998
> 17218         C        NA    NA -123.6580 41.59610
> 17219         C        NA    NA -123.4513 40.70992
> 17220         C        NA    NA -123.0901 39.06473
> 17221        BC        NA    NA -123.0653 38.94845
> 17222        BC        NA    NA -122.9464 38.36808
> 17223      <NA>       464    NA -123.0143 38.70205
> 17224      <NA>        NA     5 -122.8609 37.94137
> 17225      <NA>        NA     5 -122.8628 37.95057
> 17226      <NA>        NA     7 -122.8646 37.95978
>
> For future reference, perhaps paste this in a way that's easy for us  
> to paste into a running R session so we can use it, like so:
>
> df <- data.frame(
> coln=c(17209, 17210, 17211, 17212, 17213, 17214, 17215, 17216,  
> 17217, 17218, 17219, 17220, 17221, 17222, 17223, 17224, 17225, 17226),
> attr1 
> = 
> c 
> ("D 
> ","BC 
> ","B","BC","C",NA,"C","BC","C","C","C","C","BC","BC",NA,NA,NA,NA),
> attr2=c( NA,NA,NA,NA,NA,464,NA,NA,NA,NA,NA,NA,NA,NA,464,NA,NA,NA),
> attr3=c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,5,5,7),
> LON 
> = 
> c 
> ( -122.9409 
> ,-122.9581 
> ,-123.6851 
> ,-123.0724 
> ,-123.7240 
> ,-122.9430 
> ,-123.4442 
> ,-122.9389 
> ,-123.0747 
> ,-123.6580 
> ,-123.4513 
> ,-123.0901 
> ,-123.0653,-122.9464,-123.0143,-122.8609,-122.8628,-122.8646),
> LAT 
> = 
> c 
> (38.27645,38.36304,41.67121,38.93073,41.84403,38.30988,40.65369,38.31551,38.97998,41.59610,40.70992,39.06473,38.94845,38.36808,38.70205,37.94137,37.95057,37.95978 
> ))
>
>
> If I wanted to find the row with Lat = 37.95978
>
> Using an "indexing vector":
>
> R> lats <- df$LAT == 37.95978
> # or with the %~% from before:
> # lats <- df$LAT %~% 37.95978
> R> df[lats,]
>   coln attr1 attr2 attr3       LON      LAT
> 18 17226  <NA>    NA     7 -122.8646 37.95978
>
> Using the "subset" function:
>
> R> subset(df, LAT == 37.95978)
>   coln attr1 attr2 attr3       LON      LAT
> 18 17226  <NA>    NA     7 -122.8646 37.95978
>
>
> , how would i do that?  How would  I find the rows with BC?
>
> R> subset(df, attr1 == 'BC')
>   coln attr1 attr2 attr3       LON      LAT
> 2  17210    BC    NA    NA -122.9581 38.36304
> 4  17212    BC    NA    NA -123.0724 38.93073
> 8  17216    BC    NA    NA -122.9389 38.31551
> 13 17221    BC    NA    NA -123.0653 38.94845
> 14 17222    BC    NA    NA -122.9464 38.36808
>
>
> If you try with an "indexing vector" the NA's will trip you up:
>
> R> df[df$attr1 == 'BC',]
>     coln attr1 attr2 attr3       LON      LAT
> 2    17210    BC    NA    NA -122.9581 38.36304
> 4    17212    BC    NA    NA -123.0724 38.93073
> NA      NA  <NA>    NA    NA        NA       NA
> 8    17216    BC    NA    NA -122.9389 38.31551
> 13   17221    BC    NA    NA -123.0653 38.94845
> 14   17222    BC    NA    NA -122.9464 38.36808
> NA.1    NA  <NA>    NA    NA        NA       NA
> NA.2    NA  <NA>    NA    NA        NA       NA
> NA.3    NA  <NA>    NA    NA        NA       NA
> NA.4    NA  <NA>    NA    NA        NA       NA
>
> So you could do something like:
>
> > df[df$attr1 == 'BC' & !is.na(df$attr1),]
>   coln attr1 attr2 attr3       LON      LAT
> 2  17210    BC    NA    NA -122.9581 38.36304
> 4  17212    BC    NA    NA -123.0724 38.93073
> 8  17216    BC    NA    NA -122.9389 38.31551
> 13 17221    BC    NA    NA -123.0653 38.94845
> 14 17222    BC    NA    NA -122.9464 38.36808
>
>
> HTH,
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Physiology, Biophysics and Systems Biology
> Weill Medical College of Cornell University
>
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
>
>
>
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  |  Memorial Sloan-Kettering Cancer Center
>
>  |  Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
>

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
   |  Memorial Sloan-Kettering Cancer Center
   |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact




More information about the R-help mailing list