[R] Levels in returned data.frame after subset

Greg Snow Greg.Snow at imail.org
Sun Sep 5 01:14:02 CEST 2010


The advantage of computers is that they do exactly what they are told.
The disadvantage of computers is that they do exactly what they are told.

R is a set of instructions to the computer, those instructions are a combinations from the original programmers and from you.  Who should make important decisions about the structure of your data?  A group of (admittedly brilliant) programmers who have never seen your data nor know what questions you are trying to answer, or you (who hopefully knows more about your data and questions)?

I don't claim to be more intelligent/knowledgable than the programmers of R, but I am grateful that they have/had sufficient humility to allow for the possibility that I may actually know something about my data and questions that they don't (or maybe they are just to lazy to do my job for me, but that is also appropriate).

In your example below, why do you care what the levels of gender are after the subset?  Why waste time/effort dropping the levels for a column that by definition only has one value?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Ulrik Stervbo
> Sent: Saturday, September 04, 2010 6:53 AM
> To: r-help at r-project.org
> Subject: [R] Levels in returned data.frame after subset
> 
> Dear List,
> 
> When I subset a data.frame, the levels are not re-adjusted (see
> example). Why is this? Am I missing out on some basic stuff here?
> 
> Thanks
> Ulrik
> 
> 
> > m <- data.frame(gender = c("M", "M","F"), ht = c(172, 186.5, 165), wt
> = c(91,99, 74))
> > dim(m)
> [1] 3 3
> 
> > levels(m$gender)
> [1] "F" "M"
> 
> > s <- subset(m, m$gender == "M")
> > dim(s)
> [1] 2 3
> 
> > levels(s$gender)
> [1] "F" "M"
> 
> > cat <- sapply(s, is.factor); s[cat] <- lapply(s[cat], factor)
> > dim(s)
> [1] 2 3
> 
> > levels(s$gender)
> [1] "M"
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list