[R] Problem with subset() function?

David Winsemius dwinsemius at comcast.net
Wed Jan 21 00:26:03 CET 2009


Consider an alternative and realize that it is density() that is  
complaining about being passed a dataframe rather than subset  
misbehaving:

density(subset(mydf, ht >= 150.0 & wt <= 150.0)$age)

Call:
	density.default(x = subset(mydf, ht >= 150 & wt <= 150)$age)

Data: subset(mydf, ht >= 150 & wt <= 150)$age (29 obs.);	Bandwidth  
'bw' = 5.816

        x                y
  Min.   : 4.553   Min.   :3.781e-05
  1st Qu.:22.776   1st Qu.:3.108e-03
  Median :41.000   Median :1.775e-02
  Mean   :41.000   Mean   :1.370e-02
  3rd Qu.:59.224   3rd Qu.:2.128e-02
  Max.   :77.447   Max.   :2.665e-02


-- 
David Winsemius


On Jan 20, 2009, at 6:02 PM, Steven McKinney wrote:

> Hi all,
>
> Can anyone explain why the following use of
> the subset() function produces a different
> outcome than the use of the "[" extractor?
>
> The subset() function as used in
>
> density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))
>
> appears to me from documentation to be equivalent to
>
> density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])
>
> (modulo exclusion of NAs) but use of the former yields an
> error from density.default() (shown below).
>
>
> Is this a bug in the subset() machinery?  Or is it
> a documentation issue for the subset() function
> documentation or density() documentation?
>
> I'm seeing issues such as this with newcomers to R
> who initially seem to prefer using subset() instead
> of the bracket extractor.  At this point these functions
> are clearly not exchangeable.  Should code be patched
> so that they are, or documentation amended to show
> when use of subset() is not appropriate?
>
>> ### Bug in subset()?
>
>> set.seed(123)
>> mydf <- data.frame(ht = 150 + 10 * rnorm(100),
> +                    wt = 150 + 10 * rnorm(100),
> +                    age = sample(20:60, size = 100, replace = TRUE)
> +                    )
>
>
>> density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age)))
> Error in density.default(subset(mydf, ht >= 150 & wt <= 150, select  
> = c(age))) :
>  argument 'x' must be numeric
>
>
>> density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])
>
> Call:
> 	density.default(x = mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"])
>
> Data: mydf[mydf$ht >= 150 & mydf$wt <= 150, "age"] (29 obs.);	 
> Bandwidth 'bw' = 5.816
>
>       x                y
> Min.   : 4.553   Min.   :3.781e-05
> 1st Qu.:22.776   1st Qu.:3.108e-03
> Median :41.000   Median :1.775e-02
> Mean   :41.000   Mean   :1.370e-02
> 3rd Qu.:59.224   3rd Qu.:2.128e-02
> Max.   :77.447   Max.   :2.665e-02
>
>
>> sessionInfo()
> R version 2.8.0 Patched (2008-11-06 r46845)
> powerpc-apple-darwin9.5.0
>
> locale:
> C
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base
>
> loaded via a namespace (and not attached):
> [1] Matrix_0.999375-16 grid_2.8.0         lattice_0.17-15     
> lme4_0.99875-9
> [5] nlme_3.1-89
>>
>
>
>
>
>
>
> Steven McKinney
>
> Statistician
> Molecular Oncology and Breast Cancer Program
> British Columbia Cancer Research Centre
>
> email: smckinney +at+ bccrc +dot+ ca
>
> tel: 604-675-8000 x7561
>
> BCCRC
> Molecular Oncology
> 675 West 10th Ave, Floor 4
> Vancouver B.C.
> V5Z 1L3
> Canada
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list