[BioC] What are the best packages to compare multiple DE gene lists?

James W. MacDonald jmacdon at uw.edu
Wed Aug 27 16:29:31 CEST 2014


Hi Stephane,

If I understand you correctly, you have already made comparisons and now
simply want to rank genes based on the number of comparisons in which they
were found significant. I don't know of a particular package for doing
this, and it would be really easy to do using functions in base R. All you
would need to do (assuming you have some consistent identifier like Entrez
Gene IDs for each comparison), would be to concatenate all the IDs into a
single vector, and then count occurences:

mybigvec <- c(<all the DE gene IDs go here>)
mylst <- split(mybigvec, mybigvec)
df <- data.frame(ID=names(mylst), count=sapply(mylist, length))
df <- df[order(df$count, decreasing = TRUE),]

You could also take things like gene symbols along for the ride by starting
with a data.frame:

mybigdf <- data.frame(symbols = <concatenate symbols from all comps>,
geneid = <concatenate gene IDs from all comps>)
mylst <- split(mybigdf, mybigdf$geneid)
df <- data.frame(ID = names(mylst), count = sapply(mylst, nrow), symbol =
sapply(mylst, function(x) x$symbol[1]))
df <- df[order(df$count, decreasing = TRUE),]

Best,

Jim




On Wed, Aug 27, 2014 at 6:48 AM, Stephane Plaisance | VIB | <
stephane.plaisance at vib.be> wrote:

> I have full genome/exome lists of DE resulting from MA and/or RNASeq
> analyses using multiple methods (likely showing different gene even from
> the same samples due to technology biases). I would like to rank these
> lists to create a general list where redundant DE targets are pushed up and
> unique hits ranked lower.
>
> What method/package should I start with?
>
> Thanks
>
> Stephane Plaisance
> stephane.plaisance at vib.be
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list