[BioC] WGCNA: help with comparing multiple GEO studies

Wed Aug 27 03:40:41 CEST 2014

Hi Abishek,

if you are interested in modules that appear in all (or most) of your
data sets, you should run the consensus module analysis (e.g.,
blockwiseConsensusModules). At present the function has a bug which
forces all soft-thresholding powers to be the same, but I will post
the fix for it soon. The new version of blockwiseModules will also
feature an option to use full quantile normalization of input networks
to make them comparable, which should be more appropriate than the
simple single-quantile scaling used at present.

I would think carefully about excluding small-size studies - this may
be appropriate if you have a big study in the same or very similar
conditions and you trust the big study. But if the small studies are
credible and there are no big studies in the same conditions, you can
keep them.

I would make sure that all of your input data sets are carefully
pre-processed, extreme outliers are removed, and probe sets are
summarized to gene-level data. You will need to restrict all data sets
to the same genes.

Best,

Peter

On Fri, Aug 22, 2014 at 7:39 PM, Abhishek Pratap <abhishek.vit at gmail.com> wrote:
> Hi Steve and Peter
>
> My basic goal here is to study genetic similarities(if any) between a
> group of GEO studies. I have downloaded about 6-8 studies and as one
> would expect there is heterogeneity amongst them (diff platform,
> versions, study sizes(15 - 120 samples) etc). After initial step of
> normalization on each study I am trying to run a blockWiseConsensus
> analysis to see shared modules amongst these different studies. I am
> only using shared genes across all of the studies.
>
> 1. Wondering if doing consensus analysis across the studies is the
> right approach here. Intuitively I dont think I want to build modules
> on one study and compare with another as there are multiple studies
> for comparison.
>
> 2.  Given varying samples sizes (15-120) I am not sure if I shud use a
> very high soft power given 2 studies have < 20 samples or shud I
> exclude these studies.
>
> 3. I have gone through tutorial II( Consensus analysis of female and
> male liver expression data) but it is not clear to me that once the
> network is built what are the different mechanisms in which one could
> look at the consensus modules across different studies and run
> functional enrichment analysis on them.
>
>
> Thanks!
> -Abhi