[R] Problem with subset() function?

Steven McKinney smckinney at bccrc.ca
Wed Jan 21 00:02:14 CET 2009


Hi all,

Can anyone explain why the following use of
the subset() function produces a different
outcome than the use of the "[" extractor?

The subset() function as used in

 density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))

appears to me from documentation to be equivalent to

 density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])

(modulo exclusion of NAs) but use of the former yields an 
error from density.default() (shown below).


Is this a bug in the subset() machinery?  Or is it
a documentation issue for the subset() function
documentation or density() documentation?

I'm seeing issues such as this with newcomers to R
who initially seem to prefer using subset() instead
of the bracket extractor.  At this point these functions
are clearly not exchangeable.  Should code be patched
so that they are, or documentation amended to show
when use of subset() is not appropriate?

> ### Bug in subset()?

> set.seed(123)
> mydf <- data.frame(ht = 150 + 10 * rnorm(100),
+                    wt = 150 + 10 * rnorm(100),
+                    age = sample(20:60, size = 100, replace = TRUE)
+                    )


> density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))
Error in density.default(subset(mydf, ht >= 150 & wt <= 150, select = c(age))) : 
  argument 'x' must be numeric


> density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])

Call:
	density.default(x = mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"])

Data: mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"] (29 obs.);	Bandwidth 'bw' = 5.816

       x                y            
 Min.   : 4.553   Min.   :3.781e-05  
 1st Qu.:22.776   1st Qu.:3.108e-03  
 Median :41.000   Median :1.775e-02  
 Mean   :41.000   Mean   :1.370e-02  
 3rd Qu.:59.224   3rd Qu.:2.128e-02  
 Max.   :77.447   Max.   :2.665e-02  


> sessionInfo()
R version 2.8.0 Patched (2008-11-06 r46845) 
powerpc-apple-darwin9.5.0 

locale:
C

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

loaded via a namespace (and not attached):
[1] Matrix_0.999375-16 grid_2.8.0         lattice_0.17-15    lme4_0.99875-9    
[5] nlme_3.1-89       
> 






Steven McKinney

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




More information about the R-help mailing list