[R] Problem with aggregating data across time points

Allan Engelhardt allane at cybaea.com
Fri Jul 2 18:09:53 CEST 2010


On 02/07/10 16:21, Chris Beeley wrote:
> Hello-
>
> I have a dataset which basically looks like this:
>
> Location   Sex       Date          Time   Verbal    Self harm
> Violence_objects   Violence
>    A             1      1-4-2007       1800      3             0
>              1                       3
>    A             1      1-4-2007       1230      2            1
>             2                       4
>    D             2      2-4-2007       1100      0            4
>             0                       0
> ...
>
> I've put a dput of the first section of the data at the end of this
> email. [...]
>
> What I want to do is:
>
> A) sum each of the dependent variables for each of the dates (so e.g.
> in the example above for 1-4-2007 it would be 3+2=5, 0+1=1, 1+2=3, and
> 3+4=7 for each of the variables)
>    

If 'data' is the data at the end of your email, then

>  aggregate(cbind(verbal,self.harm,violence_objects,violence) ~ Date, data = data, FUN = sum)
       Date verbal self.harm violence_objects violence
1 01/04/07     25        15                3        9
2 02/04/07     24         6                8       13
3 03/04/07     17        13                0       10


is one approach.  Read help("aggregate") and don't forget the na.action= 
argument.


> B) do this sum, but only in each location this time (location is the
> first variable)- so the sum for 1-4-2007 in location A, sum for
> 1-4-2007 in location B, and so on and so on. Because this is divided
>    

The basic approach could be

>  aggregate(cbind(verbal,self.harm,violence_objects,violence) ~ Date + Location, data = data, FUN = sum)
        Date Location verbal self.harm violence_objects violence
1  01/04/07        A      7         1                0        3
2  02/04/07        A      8         2                0        1
3  03/04/07        A      0         0                0        2
4  01/04/07        B      3         2                0        1
5  02/04/07        B      4         2                0        0
6  03/04/07        B      4         0                0        3
7  01/04/07        C      4         2                3        2
8  02/04/07        C      0         0                4        2
9  03/04/07        C      1         1                0        5
10 01/04/07        D      7         6                0        3
11 02/04/07        D      0         0                0        9
12 03/04/07        D      4        11                0        0
13 01/04/07        E      4         3                0        0
14 02/04/07        E      4         0                4        0
15 03/04/07        E      8         1                0        0
16 01/04/07        F      0         1                0        0
17 02/04/07        F      8         2                0        1



> across locations, some dates will have no data going into them and
> will return 0 sums. Crucially I still want these dates to appear- so
> e.g. 21-5-2008 would appear as 0 0 0 0, then 22-5-2008 might have 1 2
> 0 0, then 23-5-2008 0 0 0 0 again, and etc.
>    

Why?

But variations on

>  data2<- data[!(as.numeric(data$Date)==3&  data$Location=="B"),] # For example
>  z<- with(data2, tapply(verbal, list(Date,Location), FUN=sum))
>  z[is.na(z)]<- 0
>  print(z)
            A B C D E F
          0 0 0 0 0 0 0
01/04/07 0 7 3 4 7 4 0
02/04/07 0 8 0 0 0 4 8
03/04/07 0 0 4 1 4 8 0



will perhaps work for you.

Hope this helps

Allan



More information about the R-help mailing list