[R] Problem with factor state when subset()ing a data.frame

Roger Leigh rleigh at whinlatter.ukfsn.org
Thu Feb 8 22:51:39 CET 2007


Hi folks,

I am running into a problem when calling subset() on a large
data.frame.  One of the columns contains strings which are used as
factors.  R seems to automatically factor the column when the
data.frame is contstructed, and this appears to not get updated when I
create a subset of the table.

A minimal testcase to demonstrate the problem follows:


sample <- data.frame(c("A", "A", "A", "A", "B", "B", "B", "C", "C", "C"),
                     c(5,3,5,3,6,7,8,3,2,6))
names(sample) <- c("ID", "Value")

print(sample)

sample.filtered <- subset(sample, ID != "B", select=c(ID, Value))
# Or sample.filtered <- subset(sample, ID != "B", select=c(ID, Value), drop=T)

print(sample.filtered)

plot(sample.filtered)
plot(sample.filtered$Value ~ sample.filtered$ID)

print(levels(sample.filtered$ID))
print(levels(factor(sample.filtered$ID)))

plot(sample.filtered$Value ~ factor(sample.filtered$ID))


Am I doing something wrong here, or is this an R bug?  How can I get
the new data.frame to update the factors, so I don't get redundant
"empty" factors on the plot by eliminating the "phantom" factors?  (I
also need to remove the unused factors for other analyses, and
factoring them "by hand" seems a little redundant.)


Kind regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 188 bytes
Desc: not available
Url : https://stat.ethz.ch/pipermail/r-help/attachments/20070208/3566d3e5/attachment.bin 


More information about the R-help mailing list