[BioC] get full variance per gene from DESeq
Wolfgang Huber
whuber at embl.de
Sat Aug 7 12:04:46 CEST 2010
Graham
the approach you describe below seems reasonable. I think that an
essentially equivalent, and perhaps operationally easier way to get
there is to use the data-dependent variance stabilising transformation
provided by DESeq (see the vignette). The result of that should be a
reasonable input for gene set enrichment computations.
A sample size of 6 is still pretty small for permutation testing,
perhaps you will need to do something parametric.
Wolfgang
On 05/08/10 12:55, Graham Thomas wrote:
> Hi Wolfgang,
>
> Thank you for your reply to my previous message. Due to the low number
> of biological conditions I am comparing, and the algorithm I would
> ideally like to use (that proposed in Jiang and Gentleman (2007)) I am
> inclined to think that I will obtain greater power if I can obtain a
> single z-score for each gene and each replicate (6 measurements) rather
> than per replicate (which is 2).
>
> As far as I'm aware 3 measurements per group will improve my power in
> permutation testing during GSEA, if this is not the case then I may test
> using the z-scores obtained as suggested with the following:
>
> with(BNevBTG,
> qnorm(pval/2) * sign(log2FoldChange)
> )
>
> Is this correct?
>
> Otherwise, I require a method to obtain an appropriate variance estimate
> for each gene per condition. I can outline the approach I am considering
> (and the problem I have) with the following argument:-
>
> geneA geneB
> 1) 50 200
> 2) 55 150
> 3) 45 175
>
> My thinking is that in order to obtain appropriate z-scores for each
> measurement geneA[1,] -> geneA[3,] (and, of course, as appropriate for
> geneB) I may use the following:
>
> vf <- rawVarFunc( cds, "geneA)" )
>
> And then add the variance due to the Poisson process (this is what I am
> unclear about how to obtain). With this I can use baseMeanA to generate
> z-scores for each measurement in the group. As a sanity check at this
> point it may not be absurd to do some kind of non-specific filtering
> based on the residuals I get?
>
> Is this a reasonable approach to calculating my z-scores and, if so, how
> do I get the variance due to the counting process? If not, any comments
> or suggestions on how to proceed would be greatly appreciated! Apologies
> for the verbose explanation!
>
> Regards,
> Graham
>
>
>
--
Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber
