[R] Reshaping a dataframe with conditional summary of columns - apply or reshape?

Thu Jan 24 19:51:26 CET 2008

Try this:

x$survey1[x$count==0] <- 0
x$survey1[x$count %in% 1:11] <- 1
x$survey1[x$count > 11] <- 2
 x.wide <- reshape(x, v.names="survey1", idvar="location",
timevar="survey1", direction="wide")
x.wide <- x.wide[order(x.wide$location),]
x.wide$var1 <- tapply(x$var1, x$location, mean)
x.wide

On 24/01/2008, Dave Hewitt <dhewitt at vims.edu> wrote:
> I need to do a somewhat complex rearrangement of an existing dataframe that
> involves an "on-the-fly" conditional summary as well. I've tried to make
> the example as simple as possible. Code to produce the object being acted
> on is at the end of the message. I hope this isn't too much for a single
> posting. Any guidance will be much appreciated.
>
> The input example looks like this:
>
> survey  year    count   location        var1    var2
> 1       1       2       A1      21.2    1
> 2       1       0       A1      15.6    1
> 3       1       29      A1      12.1    1
> 1       1       11      B3      9       2
> 3       1       5       B3      4.8     2
> 1       2       7       A2      20.1    4
> 2       2       2       A2      19.2    4
>
> I need to rearrange it to look like this (the order of the columns is not
> important):
>
> location        year    survey1 survey2 survey3 var1summ        var2
> A1      1       1       0       2       16.3    1
> B3      1       2       NA      1       6.9     2
> A2      2       1       1       NA      19.7    4
>
> The operations/conditions that I need are:
>
> (A) The observations in "count" for each survey at each location in a given
> year need to be reduced according to the following rules and then placed in
> new columns "survey1", "survey2", "survey3" :
> -- when  count = 0,  surveyX = 0
> -- when  0 < count < 11,  surveyX = 1
> -- when  11 < count,  surveyX = 2
>
> (B) The observations in var1 need to be summarized (here, by taking the
> mean) across all surveys for a given location in a given year and placed in
> "var1summ"
>
> (C) Observations in var2 need to be summarized in a similar way, except
> that they don't change across surveys for a given location in a given year
> (so mean, median, etc. would work, but a more general solution would be
> nice for variables that are categorical)
>
> (D) Surveys (rows) can be missing for a given location in a given year. I'd
> like to retain a column for each survey in the new dataframe and denote
> surveys that were missing as NA. E.g., survey2 is missing for location B3
> in year 1.
>
>
> ** Object creation:
> x <- structure(list(survey = c(1L, 2L, 3L, 1L, 3L, 1L, 2L), year = c(1L,
> 1L, 1L, 1L, 1L, 2L, 2L), count = c(2L, 0L, 29L, 11L, 5L, 7L, 2L), location
> = structure(c(1L, 1L, 1L, 3L, 3L, 2L, 2L), .Label = c("A1", "A2", "B3"),
> class = "factor"), var1 = c(21.2, 15.6, 12.1, 9, 4.8, 20.1, 19.2), var2 =
> c(1L, 1L, 1L, 2L, 2L, 4L, 4L)), .Names = c("survey", "year", "count",
> "location", "var1", "var2"), class = "data.frame", row.names = c(NA, -7L))
>
> Thanks,
> Dave Hewitt
> Fisheries Science
> Virginia Institute of Marine Science, Gloucester Point, VA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O