[R] aggregating data

jim holtman jholtman at gmail.com
Thu Jun 30 14:00:27 CEST 2011


If you have a large datatable, you might consider using 'data.table'
which is better performing than 'plyr'

> x <- read.table(textConnection("Gene     ProbeID               Expression_Level
+ A             1              0.34
+ A             2              0.21
+ E              3              0.11
+ A             4              0.21
+ F              5              0.56
+ F              6              0.87"), header = TRUE)
> closeAllConnections()
> require(data.table)
> x <- data.table(x)
> x[,
+     list(nProbes = length(ProbeID)
+         , Mean_Level = mean(Expression_Level)
+         )
+     , by = Gene
+  ]
     Gene nProbes Mean_Level
[1,]    A       3  0.2533333
[2,]    E       1  0.1100000
[3,]    F       2  0.7150000
>
>


On Thu, Jun 30, 2011 at 3:28 AM, Max Mariasegaram
<max.mariasegaram at qut.edu.au> wrote:
> Hi,
>
> I am interested in using the cast function in R to perform some aggregation. I did once manage to get it working, but have now forgotten how I did this. So here is my dilemma. I have several thousands of probes (about 180,000) corresponding to each gene; what I'd like to do is obtain is a frequency count of the various occurrences of each probes for each gene.
>
> The data would look something like this:
>
> Gene     ProbeID               Expression_Level
> A             1              0.34
> A             2              0.21
> E              3              0.11
> A             4              0.21
> F              5              0.56
> F              6              0.87
> .
> .
> .
> (180000 data points)
>
> In each case, the probeID is unique. The output I am looking for is something like this:
>
> Gene     No.ofprobes      Mean_expression
> A             3              0.25
>
> Is there an easy way to do this using "cast" or "melt"? Ideally, I would also like to see the unique probes corresponding to each gene in the wide format.
>
> Thanks in advance
> Max
>
> Maxy Mariasegaram| Reserach Fellow | Australian Prostate Cancer Research Centre| Level 1, Building 33 | Princess Alexandra Hospital | 199 Ipswich Road, Brisbane QLD 4102 Australia | t: 07 3176 3073| f: 07 3176 7440 | e: mariaseg at qut.edu.au
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list