[R] Potential problem with subset !!!

Arnaud Mosnier a.mosnier at gmail.com
Fri Feb 12 16:22:15 CET 2010


Dear useRs,

Just a little post to provide the answer of a problem that took me
some time to resolve !
Hope that reading this will permit the others to avoid that error.

When using the subset function, writing

subset (data, data$columnname == X) or subset (data, columnname == X)

do the same thing.

thus, the function consider that argument name given after the coma
(like "columnname") is the name of a column of the data frame
considered.
A problem occur when other arguments such as X are the names of both a
column of the data frame and  an object !

Here is an example:

df <- data.frame(ID = c("a","b","c","b","e"), Other = 1:5)
ID <- unique (df$ID)
ID

## Now the potential problem !!

subset (df, df$ID == ID[4])

## BE CAREFUL subset function use the column ID of the data.frame
## and NOT the object ID containing unique value !!!!

Sorry if it seems obvious for some of you, but hope that others find
it useful !!

Arnaud



More information about the R-help mailing list