[BioC] DEXseq: several "1" and "NA" in padjust column

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Feb 5 14:12:41 CET 2013


Hi,

Your code seems correct -- although I think the vignette has you you
estimate log2foldChanges  after differential exon usage testing, but
that shouldn't matter.

It also looks as if you have plenty of testable exons (almost 200k),
so the question is:

(1) For every exon that is testable, do you get a value in the
`pvalue` column for it? That is to say, the only NA's in the pvalue
column should be for rows that are not `testable`, eg:

R> all(!is.na(subset(fData(exonCounts), testable)$pvalue))

(2) Are the NA's in your padjust column only present because the
pvalue column is also NA?

R> all(is.na(fData(exonCounts)$pvalue) == is.na(fData(exonCounts)$padjust))

(3) If 1 and 2 are kosher -- we might be running out of things to
debug, and perhaps the high pvalues are what we're stuck with here.
What if you plot the log2fold(what/ever) aginst the -log10(pvalue)? Do
you see large fold changes that are getting small pvalues that look
like they should be "real"? That is to say, large fold changes might
come from a region of extreme low expression (although the vst xform I
think does try to mitigate that), so while the fold change is "high",
the pvalue won't be.

HTH,
-steve

On Tue, Feb 5, 2013 at 6:50 AM, Vincenzo Capece <vivo0304 at gmail.com> wrote:
> Thanks a lot Steve.
> This is the last part of my pipeline(the first part is just the
> concatenation of strings to build the paths to input files, but they are
> right):
>
>> samples
>       condition replicate
> 28M_1        28         1
> 28M_2        28         2
> 28M_3        28         3
> 3M_1          3         1
> 3M_2          3         2
> 3M_3          3         3
>
> exonCounts=read.HTSeqCounts(countfiles = fullFileNames,design =
> samples,flattenedfile = annotationFile)
>
> exonCounts=estimateSizeFactors(exonCounts)
>
> #sizeFactors(exonCounts)
>
> exonCounts=estimateDispersions(exonCounts,nCores=32)
>
> exonCounts=fitDispersionFunction(exonCounts)
>
> exonCounts=estimatelog2FoldChanges(exonCounts,nCores=32)
>
> exonCounts=testForDEU(exonCounts,nCores=32)
>
> res=DEUresultTable(exonCounts)
>
> DEXSeqHTML(exonCounts, path=pathToExonFiles ,color = c("#DDDDDD",
> "#CCFFFF"))
>
> plotOut=paste(pathToExonFiles,"res.csv",sep="/")
> write.csv(res,file=plotOut)
>
> This is the command that you suggested me to use:
>
>> table(fData(exonCounts)$testable)
>
>  FALSE   TRUE
> 151884 199146
>
> If you think that my pipeline is right, I can send you the file "res.csv" in
> private way.
>
> Thanks a lot
>
>
> On 5 February 2013 12:17, Steve Lianoglou <mailinglist.honeypot at gmail.com>
> wrote:
>>
>> Hi Vincenzo,
>>
>> On Tue, Feb 5, 2013 at 5:11 AM, Vincenzo Capece <vivo0304 at gmail.com>
>> wrote:
>> > Dear bioconductorers,
>> > I am using DEXseq for the splicing analysis.
>> > The pipeline works perfectly, but I have some doubts about the output:
>> > on
>> > 351000 exons I have just 10 padjust significant ones.
>> > Googling I found this thread, in which the user developed my same
>> > pipeline
>> > and in which she has the same problem:
>> > https://stat.ethz.ch/pipermail/bioconductor/2012-February/043476.html
>> > This thread finished in a "private" way, but I need to know if there are
>> > any bugs(I think fixed, because It past one year);
>>
>> The thread actually finished off "publicly." Perhaps you thing it went
>> private because you see messages like "embedded and
>> charset-unspecified text was scrubbed" in the emails of the thread
>> that follow the one you posted?
>>
>> If you click on the URL below, it's there:
>>
>> https://stat.ethz.ch/pipermail/bioconductor/attachments/20120213/5f13a499/attachment.pl
>>
>> That having been said, it's hard to say what (if anything) is going
>> wrong with your analysis without you posting any source code, but you
>> can calculate the adjust pvalues yourself.
>>
>> After running the testForDEU step, the fData(ecs)$pvalue and
>> fData(ecs)$padjust columns should be filled. One could calculate the
>> padjust column manually by doing:
>>
>> adjusted <- p.adjust(fData(ecs)$pvalue, 'BH')
>>
>> but if the values in the pvalue column are NA, so too will be the
>> adjust pvalues.
>>
>> If you have many NA pvalues after your testForDEU, then there might be
>> several reasons why they are there:
>>
>> (1) These exons have been marked as "untestable" (fData(ecs)$testable)
>> during the estimateDispersions step, which might be due to:
>>   (a) read counts for the exon below the "minCount" required per exon
>>   (b) the exon is in a gene with more than "maxExon" exons
>>   (c) the gene only has one exon?
>> (2) The calculation of the pvalue errored and didn't return anything
>> -- not sure why this would happen so frequently, though.
>> (3) ... ? ...
>>
>> > moreover I am interested
>> > knowing if the problem was at "user level" or "source code level" and,
>> > in
>> > the first case, how she fixed the bug.
>>
>> I'm not sure if this can be answered without your data, but you will
>> first need to post the code yo used.
>>
>> As for the original poster that you mentioned -- you'll see how her
>> problem was fixed when you click on the URL's in the messages that
>> look "hidden" from view.
>>
>> > Thanks in advance and congratulation for your package: it is awesome.
>>
>> Hear, hear!
>>
>> HTH,
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>>  | Memorial Sloan-Kettering Cancer Center
>>  | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
>
>
>
> --
> Regards,
>             Capece Vincenzo



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list