[BioC] What are the best packages to compare multiple DE gene lists?

Stephane Plaisance | VIB | stephane.plaisance at vib.be
Thu Aug 28 10:50:01 CEST 2014


Thanks a lot Jose, I add RankProd  to the top of my todo list!

;-)

Stephane Plaisance
stephane.plaisance at vib.be





On 28 Aug 2014, at 10:11, Jose Garcia <garciamanteiga.josemanuel at hsr.it> wrote:

> Dear Stephane,
> If I understood well what you need, you could use RankProd package that uses rank product non parametric approach to give a  p-value to the genes across different studies based on the ranking they achieve by log2FC. It permits to make such "meta-analyses" comparing lists of genes produced in different analysis.
> Jose
> 
> 
> 2014-08-28 9:03 GMT+02:00 Stephane Plaisance | VIB | <stephane.plaisance at vib.be>:
> Dear Jim,
> 
> Thanks very much for this straightforward approach. I will certainly try it. My aim is to also take into account the pvalues and if applicable also the related log-FC values attached to each gene so that more than just ranking is used. I know of biotools (endeavour) that ranks lists of apples and peers and use specific methods but have no idea where exact to start.
> 
> Thanks anyway for the help and code.
> 
> So far I have found in the Bioc pages:
> matchbox
> Orderedlist
> geneselector
> rankrank
> 
> I have tried none so if anybody has preferences, I am all ears.
> 
> Cheers
> Stephane Plaisance
> stephane.plaisance at vib.be
> 
> 
> 
> 
> 
> On 27 Aug 2014, at 16:29, James W. MacDonald <jmacdon at uw.edu> wrote:
> 
> > Hi Stephane,
> >
> > If I understand you correctly, you have already made comparisons and now simply want to rank genes based on the number of comparisons in which they were found significant. I don't know of a particular package for doing this, and it would be really easy to do using functions in base R. All you would need to do (assuming you have some consistent identifier like Entrez Gene IDs for each comparison), would be to concatenate all the IDs into a single vector, and then count occurences:
> >
> > mybigvec <- c(<all the DE gene IDs go here>)
> > mylst <- split(mybigvec, mybigvec)
> > df <- data.frame(ID=names(mylst), count=sapply(mylist, length))
> > df <- df[order(df$count, decreasing = TRUE),]
> >
> > You could also take things like gene symbols along for the ride by starting with a data.frame:
> >
> > mybigdf <- data.frame(symbols = <concatenate symbols from all comps>, geneid = <concatenate gene IDs from all comps>)
> > mylst <- split(mybigdf, mybigdf$geneid)
> > df <- data.frame(ID = names(mylst), count = sapply(mylst, nrow), symbol = sapply(mylst, function(x) x$symbol[1]))
> > df <- df[order(df$count, decreasing = TRUE),]
> >
> > Best,
> >
> > Jim
> >
> >
> >
> >
> > On Wed, Aug 27, 2014 at 6:48 AM, Stephane Plaisance | VIB | <stephane.plaisance at vib.be> wrote:
> > I have full genome/exome lists of DE resulting from MA and/or RNASeq analyses using multiple methods (likely showing different gene even from the same samples due to technology biases). I would like to rank these lists to create a general list where redundant DE targets are pushed up and unique hits ranked lower.
> >
> > What method/package should I start with?
> >
> > Thanks
> >
> > Stephane Plaisance
> > stephane.plaisance at vib.be
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >
> >
> > --
> > James W. MacDonald, M.S.
> > Biostatistician
> > University of Washington
> > Environmental and Occupational Health Sciences
> > 4225 Roosevelt Way NE, # 100
> > Seattle WA 98105-6099
> 
> 
>         [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 
> 
> -- 
> Jose M. Garcia Manteiga PhD
> Data Analysis in Functional Genomics
> Center for Translational Genomics and BioInformatics
> Dibit2-Basilica, 4A3
> San Raffaele Scientific Institute
> Via Olgettina 58, 20132 Milano (MI), Italy
> 
> Tel: +39-02-2643-9144
> e-mail: garciamanteiga.josemanuel at hsr.it 


	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list