[R] Normalizing grouped data in a data frame

Gabor Grothendieck ggrothendieck at gmail.com
Fri Nov 9 16:03:21 CET 2007


On Nov 9, 2007 5:56 AM, Sandy Small <sandy.small at nhs.net> wrote:
> Hi
> I am a newbie to R but have tried a number of ways in R to do this and
> can't find a good solution. (I could do it out of R in perl or awk but
> would like to know how to do this in R).
>
> I have a large data frame 49 variables and 7000 observations however for
> simplicity I can express it in the following data frame
>
> Base, Image, LVEF, ES_Time
> A, 1,  4.32, 0.89
> A, 2, 4.98, 0.67
> A, 3, 3.7, 0.5
> A, 3. 4.1, 0.8
> B, 1, 7.4, 0.7
> B, 3, 7.2, 0.8
> B, 4, 7.8, 0.6
> C, 1, 5.6, 1.1
> C, 4, 5.2, 1.3
> C, 5, 5.9, 1.2
> C, 6, 6.1, 1.2
> C, 7. 3.2, 1.1
>
> For each value of LVEF and ES_Time I would like to normalise the value
> to the maximum for that factor grouped by Base or Image number, adding
> an extra column to the data frame with the normalised value in it.
>
> So for the Base = B group in the data frame (the data frame should have
> the same length I'm just showing the B part) I would get a modified data
> frame as follows.
>
> Base, Image, LVEF, ES_Time, Norm_LVEF, Norm_ES_Time
> ...
> B,1,7.4, 0.7, 7.4/7.8, 0.7/0.8
> B, 3, 7.2, 0.8, 7.2/7.8, 0.8/0.8
> B, 4, 7.8, 0.6, 7.8/7.8, 0.6/0.8
> ...
>
> Where the results of the division would replace the division shown here.
> I hope this makes sense.
> If anyone can help I would be very grateful.
>
> Sandy Small
> NHS Glasgow, UK

Here is a solution using sqldf:

library(sqldf)
sqldf("select 	u.*,
		u.LVEF / max_LVEF Norm_LVEF,
		u.ES_Time / max_ES_Time Norm_ES_Time
	from
		DF u join
		(select Base, max(LVEF) max_LVEF, max(ES_Time) max_ES_Time
			from DF group by Base)
	using(Base)"
)

See http://sqldf.googlecode.com for more info.



More information about the R-help mailing list