[R] Averaging within a range of values

Gabor Grothendieck ggrothendieck at gmail.com
Sat Jan 14 18:23:15 CET 2012


On Fri, Jan 13, 2012 at 6:34 AM, doggysaywhat <chwhite at ucsd.edu> wrote:
> Hello all.
>
> I have two data frames.
> Group       Start          End
> G1                     200               700
> G2                     500               1000
> G3                     2000        3000
> G4                     4000        6000
> G5                     7000        8000
>
>
> and
>
> Pos                 C0              C1
> 200                 0.9           0.6
> 500               0.8             0.8
> 800                 0.9           0.7
> 1000              0.7           0.6
> 2000            0.6               0.4
> 2500            1.2             0.8
> 3000            0.6             1.5
> 3500            0.7             0.7
> 4000            0.8               0.8
> 4500            0.6             0.6
> 5000              0.9           0.9
> 5500            0.7               0.8
> 6000            0.8             0.7
> 6500            0.4             0.4
> 7000              0.5           0.8
> 7500            0.7               0.9
> 8000            0.9             0.5
> 8500            0.8             0.6
> 9000            0.9             0.8
>
>
> I need to conditionally average all values in columns C0 and C1 based upon
> the bins I defined in the first data frame.  For example, for the bin G1 in
> the first dataframe, the values are 200 to 700 so i would average the value
> at pos 200 (0.9) and 500 (0.8) for C0 and then perform the same thing for
> C1.
>
> I can do this in excel with array formulas but I'm relatively new to R and
> would like know if there is a function that will perform the same action.  I
> don't know if this will help, but the excel array function I used was
> average(if(range>=start)*(range<=end),range)).  Where the range is the
> entire pos column.
>
> Initially I looked at the aggregate function.   I can use aggregate when I
> give a single vector to be used for grouping such as (A,B,C) but I'm not
> sure how to define grouping as the bin 200-500 and the second bin as
> 500-1000 etc. and use that as my grouping vector.
>

Here is an sqldf solution where the two input data frames are d1 and
d2 (as in Jeff's post).  Note that Group is quoted since its an SQL
keyword:

library(sqldf)

sqldf("select d1.'Group', avg(d2.C0), avg(d2.C1)
   from d1, d2
   where d2.Pos between d1.Start and d1.End
   group by d1.'Group'")

The result is;

  Group avg(d2.C0) avg(d2.C1)
1    G1       0.85  0.7000000
2    G2       0.80  0.7000000
3    G3       0.80  0.9000000
4    G4       0.76  0.7600000
5    G5       0.70  0.7333333

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list