[R] problem of data manipulation

hadley wickham h.wickham at gmail.com
Tue Jan 19 22:22:59 CET 2010


On Mon, Jan 18, 2010 at 1:54 PM, Bert Gunter <gunter.berton at gene.com> wrote:
> One way to do it:
>
> 1. Convert your date column to the Date class using the as.Date() function.
> This allows you to do the necessary arithmetic on the dates below.
> dt <- as.Date(a[,4],"%d/%m/%Y")
>
> 2. Create a factor out of your first three columns whose levels are in the
> same order as the unique rows. Something likes the following should do it:
> fac <- do.call(paste,a[,-4])
> fac <- factor(fac, levels=unique(fac))
>
> This allows you to choose the groups of rows whose dates you wish to compare
> and maintain their correct order in the data frame
>
> 3. Then use tapply:
> a[unlist(tapply(dt,fac,function(x)x-min(x) < 7)),]
>
> (unlist is needed to remove the list structure and concatenate the logical
> indices to obtain the subscripting vector).

Here's the same basic approach with the plyr package:

a <- structure(list(var1 = structure(c(3L, 1L, 1L, 2L, 2L, 2L), .Label = c("c",
"n", "s"), class = "factor"), var2 = c(1, 1, 1, 2, 2, 2), var3 = c(2,
2, 2, 1, 1, 1), dt = structure(c(10592, 10997, 11000, 10998,
11002, 11010), class = "Date")), .Names = c("var1", "var2", "var3",
"dt"), row.names = c(NA, -6L), class = "data.frame")

library(plyr)
ddply(a, c("var1", "var2", "var3"), subset, dt - min(dt) < 7)

Hadley

-- 
http://had.co.nz/



More information about the R-help mailing list