[R] Normalizing grouped data in a data frame

Duncan Murdoch murdoch at stats.uwo.ca
Fri Nov 9 12:35:26 CET 2007


Sandy Small wrote:
> Hi
> I am a newbie to R but have tried a number of ways in R to do this and 
> can't find a good solution. (I could do it out of R in perl or awk but 
> would like to know how to do this in R).
>
> I have a large data frame 49 variables and 7000 observations however for 
> simplicity I can express it in the following data frame
>
> Base, Image, LVEF, ES_Time
> A, 1,  4.32, 0.89
> A, 2, 4.98, 0.67
> A, 3, 3.7, 0.5
> A, 3. 4.1, 0.8
> B, 1, 7.4, 0.7
> B, 3, 7.2, 0.8
> B, 4, 7.8, 0.6
> C, 1, 5.6, 1.1
> C, 4, 5.2, 1.3
> C, 5, 5.9, 1.2
> C, 6, 6.1, 1.2
> C, 7. 3.2, 1.1
>
> For each value of LVEF and ES_Time I would like to normalise the value 
> to the maximum for that factor grouped by Base or Image number, adding 
> an extra column to the data frame with the normalised value in it.
>
> So for the Base = B group in the data frame (the data frame should have 
> the same length I'm just showing the B part) I would get a modified data 
> frame as follows.
>
> Base, Image, LVEF, ES_Time, Norm_LVEF, Norm_ES_Time
> ...
> B,1,7.4, 0.7, 7.4/7.8, 0.7/0.8
> B, 3, 7.2, 0.8, 7.2/7.8, 0.8/0.8
> B, 4, 7.8, 0.6, 7.8/7.8, 0.6/0.8
> ...
>
> Where the results of the division would replace the division shown here.
> I hope this makes sense.
> If anyone can help I would be very grateful.
>   
You want to look at the by(), tapply() or sparseby() functions (the 
latter in the reshape package, the others are in base R).

For example, I think this untested code does what you want:

newdf <- sparseby(olddf, c("Base", "Image"),
                               function(subset)
                                    within(subset,
                                           { Norm_LVEF <- LVEF/max(LVEF)
                                              Norm_ES_Time <- 
ES_Time/max(ES_Time)
                                           }))

where olddf is the old dataframe, and newdf is newly created.

Duncan Murdoch



More information about the R-help mailing list