[BioC] get full variance per gene from DESeq

Wolfgang Huber whuber at embl.de
Sat Aug 7 12:04:46 CEST 2010


Graham

the approach you describe below seems reasonable. I think that an 
essentially equivalent, and perhaps operationally easier way to get 
there is to use the data-dependent variance stabilising transformation 
provided by DESeq (see the vignette). The result of that should be a 
reasonable input for gene set enrichment computations.

A sample size of 6 is still pretty small for permutation testing, 
perhaps you will need to do something parametric.

  Wolfgang



On 05/08/10 12:55, Graham Thomas wrote:
> Hi Wolfgang,
>
> Thank you for your reply to my previous message. Due to the low number
> of biological conditions I am comparing, and the algorithm I would
> ideally like to use (that proposed in Jiang and Gentleman (2007)) I am
> inclined to think that I will obtain greater power if I can obtain a
> single z-score for each gene and each replicate (6 measurements) rather
> than per replicate (which is 2).
>
> As far as I'm aware 3 measurements per group will improve my power in
> permutation testing during GSEA, if this is not the case then I may test
> using the z-scores obtained as suggested with the following:
>
> with(BNevBTG,
> qnorm(pval/2) * sign(log2FoldChange)
> )
>
> Is this correct?
>
> Otherwise, I require a method to obtain an appropriate variance estimate
> for each gene per condition. I can outline the approach I am considering
> (and the problem I have) with the following argument:-
>
> geneA geneB
> 1) 50 200
> 2) 55 150
> 3) 45 175
>
> My thinking is that in order to obtain appropriate z-scores for each
> measurement geneA[1,] -> geneA[3,] (and, of course, as appropriate for
> geneB) I may use the following:
>
> vf <- rawVarFunc( cds, "geneA)" )
>
> And then add the variance due to the Poisson process (this is what I am
> unclear about how to obtain). With this I can use baseMeanA to generate
> z-scores for each measurement in the group. As a sanity check at this
> point it may not be absurd to do some kind of non-specific filtering
> based on the residuals I get?
>
> Is this a reasonable approach to calculating my z-scores and, if so, how
> do I get the variance due to the counting process? If not, any comments
> or suggestions on how to proceed would be greatly appreciated! Apologies
> for the verbose explanation!
>
> Regards,
> Graham
>
>
>
> On 02/08/10 11:00, bioconductor-request at stat.math.ethz.ch wrote:
>> Message: 2 Date: Sun, 01 Aug 2010 22:50:57 +0200 From: Wolfgang Huber
>> <whuber at embl.de> To: bioconductor at stat.math.ethz.ch Subject: Re:
>> [BioC] get full variance per gene from DESeq Message-ID:
>> <4C55DE31.1010700 at embl.de> Content-Type: text/plain;
>> charset=ISO-8859-1; format=flowed Hi Graham an alternative, and maybe
>> easier and slightly more consistent approach might be to use the
>> p-values (and the sign of the foldChange) to transform to z-scores,
>> perhaps like this: with(BNevBTG, qnorm(pval/2) * sign(log2FoldChange)
>> ) PS: "more consistent" because the residuals distribution (of the
>> counts is skewed), while the p-values are uniform under the null. Best
>> wishes Wolfgang On Jul/30/10 11:47 AM, Graham Thomas wrote:
>>> > Hi All,
>>> >
>>> > I am wondering whether there is an easy way of obtaining the full
>>> variance
>>> > for a given gene and condition from my DESeq analysis?
>>> >
>>> > When I take a look at my results I end up with a table like so):
>>> >
>>> > > head ( BNevBTG )
>>> > id baseMean baseMeanA baseMeanB foldChange log2FoldChange
>>> > 1 ENSMUSG00000001627 162.62034 119.50785 205.73284 1.7215006
>>> 0.78366671
>>> > 2 ENSMUSG00000001630 45.51063 41.94099 49.08027 1.1702220 0.22678230
>>> > 3 ENSMUSG00000001632 1626.19532 1328.32256 1924.06807 1.4484946
>>> 0.53455431
>>> > 4 ENSMUSG00000001642 54.09075 54.53378 53.64772 0.9837521 -0.02363326
>>> > 5 ENSMUSG00000001655 0.00000 0.00000 0.00000 NaN NaN
>>> > 6 ENSMUSG00000001656 0.00000 0.00000 0.00000 NaN NaN
>>> > pval padj resVarA resVarB
>>> > 1 9.930922e-05 0.0005666467 0.4043083 2.7960349
>>> > 2 4.628048e-01 0.6573258560 0.1815569 0.5295679
>>> > 3 4.216269e-04 0.0021251136 0.3436054 0.3589968
>>> > 4 9.559678e-01 1.0000000000 1.0424044 0.8391154
>>> > 5 NA NA NA NA
>>> > 6 NA NA NA NA
>>> >
>>> >
>>> > I would like to transform my results into z-scores for GSEA
>>> purposes. As
>>> > far as I understand in order to do this I require my baseMean value
>>> > (which I have), my baseMeanA(and B) values, and the full variance
>>> (which
>>> > I want).
>>> >
>>> > It may be noted that my practical statistics knowledge is HEAVILY
>>> > limited so any helpp at all here is greatly appreiciated!
>>> >
>>> >
>>> > Regards,
>>> > Graham
>>> >
>>> >
>> -- Wolfgang Huber EMBL
>> http://www.embl.de/research/units/genome_biology/huber
>> ------------------------------
>
>

-- 


Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list