[R] dividing a dataframe column by different constants

David Winsemius dwinsemius at comcast.net
Thu Sep 3 18:43:49 CEST 2009


On Sep 3, 2009, at 12:17 PM, Ottorino-Luca Pantani wrote:

> Dear R users, today I've got the following problem.
> Here you are a dataframe as example.
> There are some SAMPLES  for which a CONCentration was recorded  
> through TIME.
> The time during which the concentration was recorded is not always  
> the same,
> 10 points for Sample A, 7 points for Sample B and 11 for sample C
>
> Also the initial concentration was not the same for the three samples.
>
> I would like express the concentrations as % of the concentration at  
> time = 1, therefore I wrote the following code that do the job, but  
> is impractical when the samples are, as in my real case, more than  
> on hundred.
> It is known that at the minimum time is present the maximum  
> concentration, by which divide all the other concentrations in the  
> sample.
>
> I'm quite sure that there's a more elegant solution, but I really do  
> not even imagine how to write it.
>
> Thanks in advance for your time
>
>
> (df.mydata <- data.frame(
>                        CONC =
>                        c(seq( from = 1, to = 0.1, by = -0.1 ),
>                          seq( from = 0.8, to = 0.2, by = -0.1 ),
>                          seq( from = 0.6, to = 0.1, by = -0.05 )),
>                        TIME =
>                        c(1:10,
>                          2:8,
>                          4:14 ),
>                        SAMPLE = c( rep( "A", 10 ),
>                          rep( "B", 7 ),
>                          rep( "C", 11 )
>                          )
>                        )
> )

Perhaps this:

by(df.mydata, df.mydata$SAMPLE, function(x) x$CONC/x$CONC[1] )

...or if you wanted to used max(x$CONC) as the standardizing procedure  
hat ought to work as well. With your data is gives identical results.

The equivalent tapply construction would be:

tapply(df.mydata$CONC, df.mydata$SAMPLE, function(x) x/x[1] )


> MAX <- tapply( df.mydata$CONC, df.mydata$SAMPLE, max )
> (df.mydata$PERCENTAGE <-
> ifelse(df.mydata$SAMPLE == "A",  df.mydata$CONC / MAX[1],
>       ifelse(df.mydata$SAMPLE == "B",  df.mydata$CONC / MAX[2],
>              df.mydata$CONC / MAX[3])))
>
> -- 
> Ottorino-Luca Pantani, Università di Firenze
> Dip. Scienza del Suolo e Nutrizione della Pianta
> P.zle Cascine 28 50144 Firenze Italia
> Tel 39 055 3288 202 (348 lab) Fax 39 055 333 273 OLPantani at unifi.it  http://www4.unifi.it/dssnp/
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list