[R] redundant factor levels after subsetting a dataset

Daniel Malter daniel at umd.edu
Thu Nov 12 03:00:54 CET 2009


#I have a data frame with a numeric and a character variable. 

x=c(1,2,3,2,0,2,-1,-2,-4)
md=c(rep("Miller",3), rep("Richard",3),rep("Smith",3))
data1=data.frame(x,md)

#I subset this data.frame in a way such that one level of the character
variable does not appear in the new dataset. 

data2=data1[x>0,]
data3=subset(data1,x>0)

#However, when I check the levels of the factor variable in the subset data
frame, it still shows the levels that are now unused. 

unique(data2$md)
unique(data3$md)

#This leads to complications in table and tapply that I want to avoid.

table(data2$md)
tapply(data2$x,data2$md,mean)

table(data3$md)
tapply(data3$x,data3$md,mean)

#Basically, I want to completely remove "Smith" from data frame data2 or
data3 so that it would not show up in table or tapply operations.

Thanks for any pointers,
Daniel







-----------------------------------------------
"Who has visions, should see a doctor," 
Helmut Schmidt, German Chancellor (1974-1982).




More information about the R-help mailing list