[BioC] What are the best packages to compare multiple DE gene lists?

James W. MacDonald jmacdon at uw.edu
Thu Aug 28 16:38:20 CEST 2014


Hi Stephane,

If you want to be more systematic about the comparisons, you might also
consider the GeneMeta package.

Best,

Jim




On Thu, Aug 28, 2014 at 3:03 AM, Stephane Plaisance | VIB | <
stephane.plaisance at vib.be> wrote:

> Dear Jim,
>
> Thanks very much for this straightforward approach. I will certainly try
> it. My aim is to also take into account the pvalues and if applicable also
> the related log-FC values attached to each gene so that more than just
> ranking is used. I know of biotools (endeavour) that ranks lists of apples
> and peers and use specific methods but have no idea where exact to start.
>
> Thanks anyway for the help and code.
>
> So far I have found in the Bioc pages:
> matchbox
> Orderedlist
> geneselector
> rankrank
>
> I have tried none so if anybody has preferences, I am all ears.
>
> Cheers
> Stephane Plaisance
> stephane.plaisance at vib.be
>
>
>
>
>
> On 27 Aug 2014, at 16:29, James W. MacDonald <jmacdon at uw.edu> wrote:
>
> Hi Stephane,
>
> If I understand you correctly, you have already made comparisons and now
> simply want to rank genes based on the number of comparisons in which they
> were found significant. I don't know of a particular package for doing
> this, and it would be really easy to do using functions in base R. All you
> would need to do (assuming you have some consistent identifier like Entrez
> Gene IDs for each comparison), would be to concatenate all the IDs into a
> single vector, and then count occurences:
>
> mybigvec <- c(<all the DE gene IDs go here>)
> mylst <- split(mybigvec, mybigvec)
> df <- data.frame(ID=names(mylst), count=sapply(mylist, length))
> df <- df[order(df$count, decreasing = TRUE),]
>
> You could also take things like gene symbols along for the ride by
> starting with a data.frame:
>
> mybigdf <- data.frame(symbols = <concatenate symbols from all comps>,
> geneid = <concatenate gene IDs from all comps>)
> mylst <- split(mybigdf, mybigdf$geneid)
> df <- data.frame(ID = names(mylst), count = sapply(mylst, nrow), symbol =
> sapply(mylst, function(x) x$symbol[1]))
> df <- df[order(df$count, decreasing = TRUE),]
>
> Best,
>
> Jim
>
>
>
>
> On Wed, Aug 27, 2014 at 6:48 AM, Stephane Plaisance | VIB | <
> stephane.plaisance at vib.be> wrote:
>
>> I have full genome/exome lists of DE resulting from MA and/or RNASeq
>> analyses using multiple methods (likely showing different gene even from
>> the same samples due to technology biases). I would like to rank these
>> lists to create a general list where redundant DE targets are pushed up and
>> unique hits ranked lower.
>>
>> What method/package should I start with?
>>
>> Thanks
>>
>> Stephane Plaisance
>> stephane.plaisance at vib.be
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
>


-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list