[BioC] What to do with this data? Question on deconfouding and GO analysis

Sat May 29 11:48:40 CEST 2010

Dear January

some suggestions below.

On 28/05/10 16:02, January Weiner wrote:
> Hello,
>
> I've been asked to analyze data from the following experiment.
>
> Two types of cells were analyzed either separately (A, B) or in a
> mixture (AB). In each experiment, either the separated cell types or
> the mixture was subjected to a treatment. From each such experiment, a
> single Agilent two-color microarray was prepared, with untreated cells
> used as a control.
>
> Of course, proper significance analysis cannot be done, and I can only
> use the technical p-values generated by the Agilent software. Due to
> the nature of the experiment, it is unlikely that another data set can
> be generated in a foreseeable future. However, the results in general
> show the expected response to treatment and activation of a number of
> genes that are supposed to be activated; thus, the technical p-values
> still give a meaningful "general picture".
>
> By manually going through the data it is obvious that in many cases,
> the response in AB is a weighted average of the responses A and B. I
> tried to estimate this global weights in a very naive manner, by
> looking at the correlation between the fold change in experiment AB,
> and the fold change estimated from experiments A and B for different
> values of p, the proportion of cells of type A in the mixture AB.
>
> My first question is therefore -- is there a recommended solution
> within Bioconductor that I could apply in such a case?

I am not sure there is, or there needs to be. It seems that your most 
basic model is

       AB = pA + (1-p)B

where AB, A and B are the fold changes observed in samples AB, A and B 
respectively. You can rearrange this to:

       p = (AB-B) / (A-B)

Hence I would do a scatterplot of (A-B) on the x-axis versus (AB-B) on 
the y-axis and see if you can reasonably fit a regression line.

>
> Furthermore, I'd like to look for an interaction effect -- to predict
> genes, GO terms or pathways that behave "not according to predictions"
> in the mixture AB. For this, I assume that the technical p-values are
> meaningful (because I do not have another choice),

Yes, you do: ignore the p-values, and work with the fold-changes.

>  and run a GO / SPIA
> analysis on the three microarrays separately. Then, I manually look
> through the results to find enriched terms which are different for the
> AB experiment.
>
> I wonder whether there is a possibility to compare results of two
> GO-analyses. One could, for example, look for changes in rank
> positions of different GO terms (since the p-values in such a set up
> would probably be not very meaningful).
>

Have a look at the Category package, in particular its vignette, which 
takes a slightly more abstracted view of gene set enrichments than "sets 
of genes with low p-values" - i.e. you can look at enrichment of 
arbitrarily constructed comparison statistics.

Also, at this one, from your (and my) neighbours:

Nucleic Acids Res. 2010
GOing Bayesian: model-based gene set analysis of genome-scale data.
Bauer S, Gagneur J, Robinson PN.

> Thanks in advance for any help, suggestions, material for further reading etc.,
>
> j.
>

-- 

Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber