[R] aggregating using 'with' function

AC Del Re delre at wisc.edu
Sun Feb 21 14:40:30 CET 2010


Wow! Jim, this is really impressive. I can't wrap my head around how
you figured this out.

Thank you,

AC

On Sun, Feb 21, 2010 at 12:02 AM, jim holtman <jholtman at gmail.com> wrote:
> This will do it.  You can see two different values for id=1:
>
>>  x <- with(datas,  aggregate(list(r = r),  by = list(id = id, mod1 =
>> mod1),mean))
>> x
>    id mod1      r
> 1   1    1  0.980
> 2   4    1  0.640
> 3   7    1  0.490
> 4  10    1  0.180
> 5   1    2  0.295
> 6   5    2  0.490
> 7   8    2  0.330
> 8  11    2  0.600
> 9   6    3 -0.040
> 10  9    3  0.580
> 11 12    3  0.210
>> # choose random duplicate to use
>> do.call(rbind, lapply(split(x, x$id), function(.data)
>> .data[sample(nrow(.data), 1),]))
>    id mod1     r
> 1   1    1  0.98
> 4   4    1  0.64
> 5   5    2  0.49
> 6   6    3 -0.04
> 7   7    1  0.49
> 8   8    2  0.33
> 9   9    3  0.58
> 10 10    1  0.18
> 11 11    2  0.60
> 12 12    3  0.21
>>
>> # choose random duplicate to use - try to see if a different one comes up
>> do.call(rbind, lapply(split(x, x$id), function(.data)
>> .data[sample(nrow(.data), 1),]))
>    id mod1      r
> 1   1    2  0.295
> 4   4    1  0.640
> 5   5    2  0.490
> 6   6    3 -0.040
> 7   7    1  0.490
> 8   8    2  0.330
> 9   9    3  0.580
> 10 10    1  0.180
> 11 11    2  0.600
> 12 12    3  0.210
>>
>>
>
>
> On Sat, Feb 20, 2010 at 9:50 PM, AC Del Re <acdelre at gmail.com> wrote:
>>
>> OK, this is great, Jim. Last question: How about if I want the 1 copy
>> of each id to be selected randomly versus taking the first one?
>>
>> AC
>>
>> On Sat, Feb 20, 2010 at 8:37 PM, jim holtman <jholtman at gmail.com> wrote:
>> > I am not sure what you mean by eliminating a row.  Now if you want only
>> > one
>> > copy of each 'id', and it is the first one, the you can use
>> > 'duplicated':
>> >
>> >> x <- with(datas,  aggregate(list(r = r),  by = list(id = id, mod1 =
>> >> mod1),mean))
>> >> x
>> >    id mod1      r
>> > 1   1    1  0.980
>> > 2   4    1  0.640
>> > 3   7    1  0.490
>> > 4  10    1  0.180
>> > 5   1    2  0.295
>> > 6   5    2  0.490
>> > 7   8    2  0.330
>> > 8  11    2  0.600
>> > 9   6    3 -0.040
>> > 10  9    3  0.580
>> > 11 12    3  0.210
>> >> subset(x, !duplicated(id))
>> >    id mod1     r
>> > 1   1    1  0.98
>> > 2   4    1  0.64
>> > 3   7    1  0.49
>> > 4  10    1  0.18
>> > 6   5    2  0.49
>> > 7   8    2  0.33
>> > 8  11    2  0.60
>> > 9   6    3 -0.04
>> > 10  9    3  0.58
>> > 11 12    3  0.21
>> >
>> >
>> > On Sat, Feb 20, 2010 at 8:07 PM, AC Del Re <delre at wisc.edu> wrote:
>> >>
>> >> Perfect! Thanks Jim.
>> >>
>> >> Do you know how I could then reduce the data even further?
>> >> Specifically, reducing it to 1 id per row? In this dataset, id 1 would
>> >> have one row eliminated.
>> >> Assume the data is much larger and cannot be deleted by visual
>> >> inspection and elimination one row at a time.
>> >>
>> >>
>> >> Thank you,
>> >>
>> >> AC
>> >>
>> >> On Sat, Feb 20, 2010 at 6:26 PM, jim holtman <jholtman at gmail.com>
>> >> wrote:
>> >> > This seems to work fine (notice the missing 'c(...)'; why did you
>> >> > think
>> >> > you
>> >> > needed it);
>> >> >
>> >> >>  with(datas,  aggregate(list(r = r),  by = list(id = id, mod1 =
>> >> >> mod1),mean))
>> >> >    id mod1      r
>> >> > 1   1    1  0.980
>> >> > 2   4    1  0.640
>> >> > 3   7    1  0.490
>> >> > 4  10    1  0.180
>> >> > 5   1    2  0.295
>> >> > 6   5    2  0.490
>> >> > 7   8    2  0.330
>> >> > 8  11    2  0.600
>> >> > 9   6    3 -0.040
>> >> > 10  9    3  0.580
>> >> > 11 12    3  0.210
>> >> >>
>> >> >
>> >> >
>> >> > On Sat, Feb 20, 2010 at 6:54 PM, AC Del Re <delre at wisc.edu> wrote:
>> >> >>
>> >> >> Hi All,
>> >> >>
>> >> >> I am interested in aggregating a data frame based on 2
>> >> >> categories--mean effect size (r) for each 'id's' 'mod1'. The
>> >> >> 'with' function works well when aggregating on one category (e.g.,
>> >> >> based on 'id' below) but doesnt work if I try 2 categories. How can
>> >> >> this be accomplished?
>> >> >>
>> >> >> # sample data
>> >> >>
>> >> >> id<-c(1,1,1,rep(4:12))
>> >> >> n<-c(10,20,13,22,28,12,12,36,19,12, 15,8)
>> >> >> r<-c(.98,.56,.03,.64,.49,-.04,.49,.33,.58,.18, .6,.21)
>> >> >> mod1<-factor(c(1,2,2, rep(c(1,2,3),3)))
>> >> >> mod2<-c(1,2,15,rep(3,9))
>> >> >> datas<-data.frame(id,n,r,mod1,mod2)
>> >> >>
>> >> >> # one category works perfect:
>> >> >>
>> >> >> with(datas,  aggregate(list(r = r),  by = list(id = id),mean))
>> >> >>
>> >> >>  id          r
>> >> >> 1   1  0.5233333
>> >> >> 2   4  0.6400000
>> >> >> 3   5  0.4900000
>> >> >> 4   6 -0.0400000
>> >> >> 5   7  0.4900000
>> >> >> 6   8  0.3300000
>> >> >> 7   9  0.5800000
>> >> >> 8  10  0.1800000
>> >> >> 9  11  0.6000000
>> >> >> 10 12  0.2100000
>> >> >>
>> >> >> # trying with 2 categories:
>> >> >>
>> >> >>  with(datas,  aggregate(list(r = r),  by = list(c(id = id, mod1 =
>> >> >> mod1)),mean))
>> >> >>
>> >> >> Error in FUN(X[[1L]], ...) : arguments must have same length
>> >> >>
>> >> >> Thank you,
>> >> >>
>> >> >> AC
>> >> >>
>> >> >> ______________________________________________
>> >> >> R-help at r-project.org mailing list
>> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> PLEASE do read the posting guide
>> >> >> http://www.R-project.org/posting-guide.html
>> >> >> and provide commented, minimal, self-contained, reproducible code.
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Jim Holtman
>> >> > Cincinnati, OH
>> >> > +1 513 646 9390
>> >> >
>> >> > What is the problem that you are trying to solve?
>> >> >
>> >
>> >
>> >
>> > --
>> > Jim Holtman
>> > Cincinnati, OH
>> > +1 513 646 9390
>> >
>> > What is the problem that you are trying to solve?
>> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>



More information about the R-help mailing list