[R] Subsetting on multiple criteria (AND condition) in R

Marc Schwartz marc_schwartz at me.com
Tue Jan 14 22:05:19 CET 2014


On Jan 14, 2014, at 1:38 PM, Jeff Johnson <mrjefftoyou at gmail.com> wrote:

> I'm running the following to get what I would expect is a subset of
> countries that are not equal to "US" AND COUNTRY is not in one of my
> validcountries values.
> 
> non_us <- subset(mydf, (COUNTRY %in% validcountries) & COUNTRY != "US",
> select = COUNTRY, na.rm=TRUE)
> 
> however, when I then do table(non_us) I get:
>> table(non_us)
> non_us
>   AE AN AR AT AU BB BD BE BH BM BN BO BR BS CA CH CM CN CO CR CY DE DK DO
> EC ES
> 0  3  0  2  1 31  4  1  1  1 45  1  1  4  5 86  3  1  8  1  2  1  8  2  1
> 2  4
> FI FR GB GR GU HK ID IE IL IN IO IT JM JP KH KR KY LU LV MO MX MY NG NL NO
> NZ PA
> 2  4 35  3  3 14  3  5  2  5  1  2  1 15  1 11  2  2  1  1 23  7  1  6  1
> 3  1
> PE PG PH PR PT RO RU SA SE SG TC TH TT TW TZ US ZA
> 2  1  1  8  1  1  1  1  1 18  1  1  2 11  1  0  3
>> 
> 
> Notice US appears as the second to last. I expected it to NOT appear.
> 
> Do you know if I'm using incorrect syntax? Is the & symbol equivalent to
> AND (notice I have 2 criteria for subsetting)? Also, is COUNTRY != "US"
> valid syntax? I don't get errors, but then again I don't get what I expect
> back.
> 
> Thanks in advance!
> 
> 
> 
> -- 
> Jeff


Review the Details section of ?subset, where you will find the following:

"Factors may have empty levels after subsetting; unused levels are not automatically removed. See droplevels for a way to drop all unused levels from a data frame."


Your syntax is fine and the behavior is as expected.

Regards,

Marc Schwartz




More information about the R-help mailing list