[R] reshaping a dataset

Gabor Grothendieck ggrothendieck at gmail.com
Wed Sep 13 06:32:28 CEST 2006


If I understand this correctly we want to sum the mass over each combination
of the first 6 variables and display the result with the 6th, prey,
along the top and the others along the side.

library(reshape)
testm <- melt(test, id = 1:6)
cast(testm, nbpc + trip + set + tagno + depth ~ prey)

Now fix up the NAs.

On 9/12/06, Denis Chabot <chabotd at globetrotter.net> wrote:
> Hi,
>
> I'm trying to move to R the last few data handling routines I was
> performing in SAS.
>
> I'm working on stomach content data. In the simplified example I
> provide below, there are variables describing the origin of each prey
> item (nbpc is a ship number, each ship may have been used on
> different trips, each trip has stations, and individual fish (tagno)
> can be caught at each station.
>
> For each stomach the number of lines corresponds to the number of
> prey items. Thus a variable identifies prey type, and others (here
> only one, mass) provide information on prey abundance or size or
> digestion level.
>
> Finally, there can be accompanying variables that are not used but
> that I need to keep for later analyses (e.g. depth in the example
> below).
>
> At some point I need to transform such a dataset into another format
> where each stomach occupies a single line, and there are columns for
> each prey item.
>
> The "reshape" function works really well, my program is in fact
> simpler than the SAS equivalent (not shown, don't want to bore you,
> but available on request), except that I need zeros when prey types
> are absent from a stomach instead of NAs, a problem for which I only
> have a shaky solution at the moment:
>
> 1) creation of a dummy dataset:
> #######
> nbpc <- rep(c(20,34), c(110,90))
> trip <- c(rep(1:3, c(40, 40, 30)), rep(1:2, c(60,30)))
> set <- c(rep(1:4, c(10, 8, 7, 15)), rep(c(10,12), c(25,15)), rep(1:3,
> rep(10,3)),
>          rep(10:12, c(20, 10, 30)), rep(7:8, rep(15,2)))
> depth <- c(rep(c(100, 150, 200, 250), c(10, 8, 7, 15)), rep(c
> (100,120), c(25,15)), rep(c(75, 50, 200), rep(10,3)),
>          rep(c(200, 150, 50), c(20, 10, 30)), rep(c(100, 250), rep
> (15,2)))
> tagno <- rep(round(runif(42,1,200)),
>              c(7,3, 4,4, 2,2,3, 5,5,5,  4,6,4,3,5,3, 7,8, 4,6, 5,5,
> 7,3,
>                6,6,4,4, 4,6, 3,3,4,5,5,6,4, 5,5,5, 8,7))
> prey.codes <-c(187, 438, 792, 811)
> prey <- sample(prey.codes, 200, replace=T)
> mass <- runif(200, 0, 10)
>
> test <- data.frame(nbpc, trip, set, depth, tagno, prey, mass)
> ########
>
> Because there are often multiple occurrences of the same prey in a
> single stomach, I need to sum them for each stomach before using
> "reshape". Here I use summarizeBy because my understanding of the
> many variants of "apply" is not very good:
>
> ########
> test2 <- summaryBy(mass~nbpc+trip+set+tagno+prey, data=test, FUN=sum,
> keep.names=T, id=~depth)
>
> #this messes up sorting order, I fix it
> k <- order(test2$nbpc, test2$trip, test2$set, test2$tagno)
> test3 <- test2[k,]
> result <- reshape(test3, v.names="mass", idvar=c("nbpc", "trip",
> "set", "tagno"),
>                 timevar="prey", direction="wide")
> #########
>
> I'm quite happy with this, although you may know of better ways of
> doing it.
> But my problem is with preys that are absent from a stomach. In later
> analyses, I need them to have zero abundance instead of NA.
> My shaky solution is:
> #########
> empties <- is.na(result)
> result[empties] <- 0
> #########
>
> which did the job in this example, but it won't always. For instance
> there could have been NAs for "depth", which I do not want to become
> zero.
>
> Is there a way to transform NAs into zeros for multiple columns of a
> dataframe in one step, while ignoring some columns?
>
> Or maybe there is another way to achieve this that would have put
> zeros where I need them (i.e. something else than "reshape")?
>
> Thanking you in advance,
>
> Denis Chabot
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list