[Rd] subcripts on data frames (PR#9885)

Tony Plate tplate at acm.org
Tue Aug 28 17:44:26 CEST 2007


The line

worms[rev(order(Worm.density)),] [!duplicated(Vegetation),]

looks suspect to me -- it looks like you are first creating an sorted 
version of the dataframe 'worms', and then subsetting it based on values 
of 'Vegetation' in the original order.  When reordering dataframes I 
would avoid 'attaching' them and I would break the expression into two 
separate expressions, so to be sure the subsetting is referring to the 
appropriate values:

 > worms <- 
read.table("http://www.bio.ic.ac.uk/research/mjcraw/therbook/data/worms.txt", 
header=T)
 > worms2 <- worms[rev(order(worms$Worm.density)), ]
 > worms2[!duplicated(worms2$Vegetation), ]
       Field.Name Area Slope Vegetation Soil.pH  Damp Worm.density
9     The.Orchard  1.9     0    Orchard     5.7 FALSE            9
16   Water.Meadow  3.9     0     Meadow     4.9  TRUE            8
11    Garden.Wood  2.9    10      Scrub     5.2 FALSE            8
10  Rookery.Slope  1.5     4  Grassland     5.0  TRUE            7
2  Silwood.Bottom  5.1     2     Arable     5.2 FALSE            7
 >

Here's a one-liner involving 'with' and 'subset':
 > subset(worms[rev(order(worms$Worm.density)), ], !duplicated(Vegetation))
       Field.Name Area Slope Vegetation Soil.pH  Damp Worm.density
9     The.Orchard  1.9     0    Orchard     5.7 FALSE            9
16   Water.Meadow  3.9     0     Meadow     4.9  TRUE            8
11    Garden.Wood  2.9    10      Scrub     5.2 FALSE            8
10  Rookery.Slope  1.5     4  Grassland     5.0  TRUE            7
2  Silwood.Bottom  5.1     2     Arable     5.2 FALSE            7
 >

-- Tony Plate

m.crawley at imperial.ac.uk wrote:
> I'm not sure if this is a bug, or if I'm doing something wrong.
> =20
> =46rom the worms dataframe, which is at in a file called worms.txt at
> =20
> http://www.imperial.ac.uk/bio/research/crawley/therbook
> <http://www.imperial.ac.uk/bio/research/mjcraw/therbook/index.htm>=20
>
> =20
> the idea is to extract a subset of the rows, sorted in declining order
> of worm density, with only the maximum worm density from each vegetation
> type:
> =20
>
> worms<-read.table("c:\\temp\\worms.txt",header=3DT)
> attach(worms)
> names(worms)
>
> [1] "Field.Name"   "Area"         "Slope"        "Vegetation"
> "Soil.pH"=20=20=20=20=20
> [6] "Damp"         "Worm.density"
>
> =20
> Usinng "not duplicated" I get two rows for Meadow and none for Scrub
> =20
> worms[rev(order(Worm.density)),] [!duplicated(Vegetation),]
>
>        Field.Name Area Slope Vegetation Soil.pH  Damp Worm.density
> 9     The.Orchard  1.9     0    Orchard     5.7 FALSE            9
> 16   Water.Meadow  3.9     0     Meadow     4.9  TRUE            8
> 10  Rookery.Slope  1.5     4  Grassland     5.0  TRUE            7
> 2  Silwood.Bottom  5.1     2     Arable     5.2 FALSE            7
> 4     Rush.Meadow  2.4     5     Meadow     4.9  TRUE            5
>
> and here is the correct set of rows, but in the wrong order, using
> unique
> =20
> worms[rev(order(Worm.density)),] [unique(Vegetation),]
>
>        Field.Name Area Slope Vegetation Soil.pH  Damp Worm.density
> 16   Water.Meadow  3.9     0     Meadow     4.9  TRUE            8
> 9     The.Orchard  1.9     0    Orchard     5.7 FALSE            9
> 11    Garden.Wood  2.9    10      Scrub     5.2 FALSE            8
> 2  Silwood.Bottom  5.1     2     Arable     5.2 FALSE            7
> 10  Rookery.Slope  1.5     4  Grassland     5.0  TRUE            7
>
> =20
> Best wishes,
> =20
> Mick
> =20
> Prof  M.J. Crawley  FRS
> =20
> Imperial College London
> Silwood Park
> Ascot
> Berks
> SL5 7PY
> UK
> =20
> Phone (0) 207 5942 216
> Fax     (0) 207 5942 339
> =20
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list