[BioC] What are the best packages to compare multiple DE gene lists?

Stephane Plaisance | VIB | stephane.plaisance at vib.be
Thu Aug 28 09:03:38 CEST 2014

Dear Jim,

Thanks very much for this straightforward approach. I will certainly try it. My aim is to also take into account the pvalues and if applicable also the related log-FC values attached to each gene so that more than just ranking is used. I know of biotools (endeavour) that ranks lists of apples and peers and use specific methods but have no idea where exact to start.

Thanks anyway for the help and code.

So far I have found in the Bioc pages:

I have tried none so if anybody has preferences, I am all ears.

Stephane Plaisance
stephane.plaisance at vib.be

On 27 Aug 2014, at 16:29, James W. MacDonald <jmacdon at uw.edu> wrote:

> Hi Stephane,
> If I understand you correctly, you have already made comparisons and now simply want to rank genes based on the number of comparisons in which they were found significant. I don't know of a particular package for doing this, and it would be really easy to do using functions in base R. All you would need to do (assuming you have some consistent identifier like Entrez Gene IDs for each comparison), would be to concatenate all the IDs into a single vector, and then count occurences:
> mybigvec <- c(<all the DE gene IDs go here>)
> mylst <- split(mybigvec, mybigvec)
> df <- data.frame(ID=names(mylst), count=sapply(mylist, length))
> df <- df[order(df$count, decreasing = TRUE),]
> You could also take things like gene symbols along for the ride by starting with a data.frame:
> mybigdf <- data.frame(symbols = <concatenate symbols from all comps>, geneid = <concatenate gene IDs from all comps>)
> mylst <- split(mybigdf, mybigdf$geneid)
> df <- data.frame(ID = names(mylst), count = sapply(mylst, nrow), symbol = sapply(mylst, function(x) x$symbol[1]))
> df <- df[order(df$count, decreasing = TRUE),]
> Best,
> Jim
> On Wed, Aug 27, 2014 at 6:48 AM, Stephane Plaisance | VIB | <stephane.plaisance at vib.be> wrote:
> I have full genome/exome lists of DE resulting from MA and/or RNASeq analyses using multiple methods (likely showing different gene even from the same samples due to technology biases). I would like to rank these lists to create a general list where redundant DE targets are pushed up and unique hits ranked lower.
> What method/package should I start with?
> Thanks
> Stephane Plaisance
> stephane.plaisance at vib.be
>         [[alternative HTML version deleted]]
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099 

	[[alternative HTML version deleted]]

More information about the Bioconductor mailing list