[Rd] How does "subset" replace arguments? (PR#4193)

Thomas Lumley tlumley at u.washington.edu
Tue Sep 16 14:11:48 MEST 2003


On Tue, 16 Sep 2003 axel.benz at iao.fhg.de wrote:

> Full_Name: Axel Benz
> Version: 1.7.1
> OS: Windows
> Submission from: (NULL) (137.251.33.43)
>
>
> Hello, I guess many people will answer me again that this is a S
> language feature, but I am only a stupid computer scientist and I simply
> do not understand this logic, despite of reading a lot about S:

The point they are trying to make is that you should send this sort of
question to r-devel or r-help, not r-bugs.  The point of r-bugs is as a
repository for bug reports, not as a discussion list.


> > test
>    field           tuckey
> 4  Kreis2          -1
> 5  Kreis5          -2
> 9  Metall          -3
> 17 Kreis1          -4
> 19 Kreis8          -5
>
> > subset(test,field=="Metall")
>   field       tuckey
> 9 Metall      -3
>
> > subset(test,toString(field)=="Metall")
> [1] field   tuckey
> <0 rows> (or 0-length row.names)
>
> This happens everytime I use a function with the column name ("field", in this
> case) as parameter in the  logic expression in "subset", instead of using the
> column name on top level. I have the impression that the column name is only
> replaced when standing in top level position. I would call that "very lazy
> evaluation" ;-) ;-)
> Thank you for a friendly answer, this language is realy weird to me.
>

Your impression is incorrect.  The problem with toString is that it
collapses a vector to a single string, so toString(field) is the string
"Kreis2, Kreis5, Metall, Kries1, Kries8".  There is no record whose
`field' is equal to that string.  Did you check to see that toString did
what you thought it did?


subset() will work as I think you expect if the output of the function is
the same length as the input.

For example, consider one of the built-in data sets

data(esoph)
> subset(esoph, toString(agegp)=="75+")
[1] agegp     alcgp     tobgp     ncases    ncontrols
<0 rows> (or 0-length row.names)

but

> subset(esoph, as.character(agegp)=="75+")
   agegp     alcgp    tobgp ncases ncontrols
78   75+ 0-39g/day 0-9g/day      1        18
79   75+ 0-39g/day    10-19      2         6
80   75+ 0-39g/day      30+      1         3
81   75+     40-79 0-9g/day      2         5
82   75+     40-79    10-19      1         3
83   75+     40-79    20-29      0         3
84   75+     40-79      30+      1         1
85   75+    80-119 0-9g/day      1         1
86   75+    80-119    10-19      1         1
87   75+      120+ 0-9g/day      2         2
88   75+      120+    10-19      1         1


or to take a really extreme version
> subset(esoph, substr(paste(as.character(agegp),toupper(as.character(agegp))),3,6)== "+ 75")
   agegp     alcgp    tobgp ncases ncontrols
78   75+ 0-39g/day 0-9g/day      1        18
79   75+ 0-39g/day    10-19      2         6
80   75+ 0-39g/day      30+      1         3
81   75+     40-79 0-9g/day      2         5
82   75+     40-79    10-19      1         3
83   75+     40-79    20-29      0         3
84   75+     40-79      30+      1         1
85   75+    80-119 0-9g/day      1         1
86   75+    80-119    10-19      1         1
87   75+      120+ 0-9g/day      2         2
88   75+      120+    10-19      1         1


	-thomas



More information about the R-devel mailing list