[R] Potential problem with subset !!!

Douglas Bates bates at stat.wisc.edu
Fri Feb 12 16:35:03 CET 2010


On Fri, Feb 12, 2010 at 9:22 AM, Arnaud Mosnier <a.mosnier at gmail.com> wrote:
> Dear useRs,
>
> Just a little post to provide the answer of a problem that took me
> some time to resolve !
> Hope that reading this will permit the others to avoid that error.
>
> When using the subset function, writing
>
> subset (data, data$columnname == X) or subset (data, columnname == X)
>
> do the same thing.
>
> thus, the function consider that argument name given after the coma
> (like "columnname") is the name of a column of the data frame
> considered.
> A problem occur when other arguments such as X are the names of both a
> column of the data frame and  an object !
>
> Here is an example:
>
> df <- data.frame(ID = c("a","b","c","b","e"), Other = 1:5)
> ID <- unique (df$ID)
> ID
>
> ## Now the potential problem !!
>
> subset (df, df$ID == ID[4])
>
> ## BE CAREFUL subset function use the column ID of the data.frame
> ## and NOT the object ID containing unique value !!!!
>
> Sorry if it seems obvious for some of you, but hope that others find
> it useful !!

Myself, I think it would be obvious to anyone who had read the
documentation for which the third paragraph is

     For data frames, the ‘subset’ argument works on the rows.  Note
     that ‘subset’ will be evaluated in the data frame, so columns can
     be referred to (by name) as variables in the expression (see the
     examples).

>

> Arnaud
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list