[R] dataframe calculations based on certain values of a column

Berend Hasselman bhh at xs4all.nl
Wed Mar 26 17:50:05 CET 2014


On 26-03-2014, at 17:09, Johannes Radinger <johannesradinger at gmail.com> wrote:

> Hi,
> 
> I have data in a dataframe in following structure
> var1 <- c("a","b","c","a","b","c","a","b","c")
> var2 <- c("X","X","X","Y","Y","Y","Z","Z","Z")
> var3 <- c(1,2,2,5,2,6,7,4,4)
> df <- data.frame(var1,var2,var3)
> 
> Now I'd like to calculate relative values of var3. This values
> should be relative to the base value (where var1=c) which is
> indicated for each group (var2).
> 
> To illustrate how my result column should look like I divide
> the column var3 by a vector c(2,2,2,6,6,6,4,4,4) (= for each group
> of var2 the value c)
> 
> Of course this can also be done like this:
> df$div <- rep(df$var3[df$var1=="c"],each=length(unique(df$var1)))
> df$result_calc <- df$var3/df$div
> 
> 
> However what when the dataframe is not as simple and not that well ordered
> as
> in the example here. So for example there is always a value c for each group
> but all the "c"s are clumped in the last rows of the dataframe or scatterd
> in a random
> mannar. Is there a simple way to still calculate such relative values.
> Probably with an approach using apply, but maybe someone can give me a hint.
> Or do I need to sort my dataframe in order to do such calculations?


Create a list splitting the data.frame into groups defined by column var2.
And perform the calculation you need. Like this

df <- data.frame(var1,var2,var3, stringsAsFactors=FALSE)
L <- by(df,list(df$var2), FUN=function(x) { k <- which(x$var1=="c"); x$rel <- x$var3/x$var3[k];x})  


And then convert the list L back to a data.frame.

See the following two stackoverflow pages for the various ways this can be done.

http://stackoverflow.com/questions/4227223/r-list-to-data-frame
http://stackoverflow.com/questions/4512465/what-is-the-most-efficient-way-to-cast-a-list-as-a-data-frame?rq=1

Two methods from the first page:

data.frame(Reduce(rbind,L))

library (plyr)
ldply (L, data.frame)

and one method from the second page:

for this method

do.call(rbind,L)

Berend



More information about the R-help mailing list