[BioC] DEXSeq for 1 gene

Alejandro Reyes alejandro.reyes at embl.de
Tue May 28 17:56:25 CEST 2013


Dear Eamonn,

With such low counts for a each exon, you can not infer any isoform 
regulation using either DEXSeq or any tool, since the signal to noise 
ratio is very low.
Do all your genes have similar low numbers? If so, I would check the 
data quality or the quality of the transcript assemblies that you are 
using.  If other genes have higher counts it could also mean that this 
gene is simply not so expressed in your samples.

Best regards,
Alejandro Reyes



> Hi Wolfgang
> Although I have 20 samples, only 11 are under the conditions I'm
> interested in this analysis. Not sure why I mentioned 20 in the first
> place sorry. Of those 11, 2 are giving odd results from top hat so I
> haven't included them yet until I figure out what is wrong. So I have 9 to
> work with
>
>> counts(ecs)
> K61 K62 K63 K83 Q61 Q62 Q63 Q81 Q82
> XLOC_000001:E001 1 0 1 0 1 0 0 0 0
> XLOC_000001:E002 1 0 0 0 2 1 0 1 1
> XLOC_000001:E003 0 0 0 0 3 0 0 0 0
> XLOC_000001:E004 1 0 1 0 1 0 0 1 0
> XLOC_000001:E005 1 3 3 2 3 0 0 0 0
> XLOC_000001:E006 0 0 0 0 0 0 0 0 0
> XLOC_000001:E007 0 1 2 2 2 0 3 2 0
> XLOC_000001:E008 2 1 2 0 0 2 0 1 0
> XLOC_000001:E009 1 2 0 0 1 0 0 2 0
> XLOC_000001:E010 4 0 1 1 0 0 0 1 0
> XLOC_000001:E011 0 0 0 0 0 1 0 0 0
> XLOC_000001:E012 0 0 1 1 0 2 0 2 0
> XLOC_000001:E013 1 1 3 2 2 2 1 3 0
> XLOC_000001:E014 3 3 1 2 2 1 3 0 1
> XLOC_000001:E015 1 2 1 2 4 2 0 0 3
> XLOC_000001:E016 3 5 2 2 7 1 0 5 0
> XLOC_000001:E017 1 2 0 2 2 2 1 3 1
> XLOC_000001:E018 0 4 3 1 6 3 2 2 1
> XLOC_000001:E019 2 2 3 2 2 2 0 5 3
> XLOC_000001:E020 3 3 2 0 3 1 2 4 2
> XLOC_000001:E021 2 1 2 4 3 1 0 3 1
> XLOC_000001:E022 2 0 1 4 3 1 0 2 1
> XLOC_000001:E023 0 0 3 1 0 2 0 2 1
> XLOC_000001:E024 3 1 0 1 0 0 0 1 1
> XLOC_000001:E025 0 1 0 2 2 2 2 0 0
> XLOC_000001:E026 0 0 2 2 1 2 0 1 0
> XLOC_000001:E027 6 1 2 3 0 0 0 4 1
> XLOC_000001:E028 7 2 3 3 0 4 1 5 0
> XLOC_000001:E029 5 4 1 5 0 5 5 3 1
> XLOC_000001:E030 0 1 1 0 3 0 0 0 0
> XLOC_000001:E031 1 1 0 0 1 0 0 1 0
>
> The low read count I put down to only looking at a single gene. Is my data
> a loss cost for the dexseq pipeline?
>
> Thanks for your help
>
> Eamonn
>
>
>
> Dr Eamonn Mallon
> Lecturer in Evolutionary Biology
> Adrian 220
> Biology Department
> University of Leicester
>
> http://www2.le.ac.uk/departments/biology/people/mallon
>
>
>
>
>
> On 26/05/2013 13:12, "Wolfgang Huber" <whuber at embl.de> wrote:
>
>> Dear Eamonn
>>
> >from your initial post -i.e. the output of head(counts(ecs))- it looks
>> like your counts are very low? This would make estimateSizeFactors
>> produce lots of NA values, but even if it did not, it would mean that
>> you'd have essentially no statistical power to detect anything useful
> >from such low counts.
>> You mentioned "20 bumblebee samples" but the table whose head you show
>> only has 9?
>>
>> Can you post the full output of 'counts(ecs)' and make sure its content
>> is plausible based on your understanding of the data?
>>
>> 	Best wishes
>> 	Wolfgang
>>
>>
>> On May 23, 2013, at 2:03 pm, Eamonn Mallon <ebm3 at leicester.ac.uk> wrote:
>>
>>> Hi Alejandro,
>>> Thanks very much for getting back to me.
>>>
>>> I think I got the second parameter idea from DESeq. I see I am wrong.
>>> so I ran
>>>
>>>> ecs<-estimateSizeFactors(ecs)
>>>> sizeFactors(ecs)
>>> K61 K62 K63 K83 Q61 Q62 Q63 Q81 Q82
>>> NA  NA  NA  NA  NA  NA  NA  NA  NA
>>>
>>> Is there any way to get DEXSeq to estimate size factors with my small
>>> dataset?
>>>
>>> If you are only interested in a single gene maybe you could visualize
>>> directly the counts per exon using the function plotDEXSeq.
>>>
>>> I'm not sure how to do this I tried
>>>
>>>> plotDEXSeq(ecs,"XLOC_000001", fitExpToVar="condition",
>>> norCounts=FALSE, expression=FALSE, splicing=FALSE,
>>> displayTranscripts=FALSE, legend=TRUE)
>>>
>>> Error in plotDEXSeq(ecs, "XLOC_000001", fitExpToVar = "condition",
>>> norCounts = FALSE,  :
>>>   Please estimate sizeFactors first
>>>
>>> I guess this is because modelFrameForGene requires sizefactors.
>>>
>>> Any ideas?
>>>
>>> Eamonn
>>>
>>>
>>> On 22/05/13 18:07, Alejandro Reyes wrote:
>>>> Dear Eamonn,
>>>>
>>>> Thanks for your interest in DEXSeq.
>>>>
>>>> In your size factor estimation, where did you get your second parameter
>>>> from? (check ?estimateSizeFactors) There is no need to specify anything
>>>> else than the ExonCountSet object and that is the reason you are
>>>> getting
>>>> the error message.
>>>>
>>>> If you are only interested in a single gene maybe you could visualize
>>>> directly the counts per exon using the function plotDEXSeq.
>>>>
>>>> Best regards,
>>>> Alejandro Reyes
>>>>
>>>>
>>>>> Dear All,
>>>>> I have RNA-seq data for 20 bumblebee samples divided into treatment
>>>>> and control. I am interested in differential exon usage. Unfortunately
>>>>> the bumblebee genome is not at the stage where I can do a complete
>>>>> DEXSeq analysis genome wide. I decided to look at a single gene where
>>>>> I have a decent gene model. Using TopHat2 and the python scripts
>>>>> included in DEXSeq I was able to produce an ExonCountSet object.
>>>>>
>>>>>> head(counts(ecs))
>>>>>                    K61 K62 K63 K83 Q61 Q62 Q63 Q81 Q82
>>>>> XLOC_000001:E001   1   0   1   0   1   0   0   0   0
>>>>> XLOC_000001:E002   1   0   0   0   2   1   0   1   1
>>>>> XLOC_000001:E003   0   0   0   0   3   0   0   0   0
>>>>> XLOC_000001:E004   1   0   1   0   1   0   0   1   0
>>>>> XLOC_000001:E005   1   3   3   2   3   0   0   0   0
>>>>> XLOC_000001:E006   0   0   0   0   0   0   0   0   0
>>>>>
>>>>> As I expected, its all come crashing down round my ears at the
>>>>> analysis stage
>>>>>
>>>>>> sizeFactors(ecs)
>>>>> K61 K62 K63 K83 Q61 Q62 Q63 Q81 Q82
>>>>>    NA  NA  NA  NA  NA  NA  NA  NA  NA
>>>>>
>>>>> I tried using the shorth value
>>>>>
>>>>>> ecs<-estimateSizeFactors(ecs,(locfunc=genefilter::shorth((counts(ecs))
>>>>>> ,tie.action="min")))
>>>>> Error in .local(object, ...) : unused argument (0.2)
>>>>>
>>>>> I understand that my main problem is that the DEXSeq analysis should
>>>>> be genome wide (my data has too few counts).
>>>>>
>>>>> Is there a way to use DEXSeq to ONLY look at a single gene? If not
>>>>> can anyone think of a statistical way to analyse my data (cross tabs?
>>>>> How should I normalise?)
>>>>>
>>>>> Thanks for your time
>>>>>
>>>>> Eamonn
>>>>>
>>>>>
>>>>>> sessionInfo()
>>>>> R version 3.0.0 (2013-04-03)
>>>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
>>>>>
>>>>> attached base packages:
>>>>> [1] parallel  stats     graphics  grDevices utils     datasets
>>>>> methods   base
>>>>>
>>>>> other attached packages:
>>>>> [1] genefilter_1.42.0  DEXSeq_1.6.0       Biobase_2.20.0
>>>>> BiocGenerics_0.6.0
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>>    [1] annotate_1.38.0      AnnotationDbi_1.22.5 biomaRt_2.16.0
>>>>> Biostrings_2.28.0    bitops_1.0-5         DBI_0.2-7
>>>>>    [7] GenomicRanges_1.12.3 hwriter_1.3          IRanges_1.18.1
>>>>> RCurl_1.95-4.1       Rsamtools_1.12.3     RSQLite_0.11.3
>>>>> [13] splines_3.0.0        statmod_1.4.17       stats4_3.0.0
>>>>> stringr_0.6.2        survival_2.37-4      tools_3.0.0
>>>>> [19] XML_3.95-0.2         xtable_1.7-1         zlibbioc_1.6.0
>>>>> Dr Eamonn Mallon
>>>>> Lecturer in Evolutionary Biology
>>>>> Adrian 220
>>>>> Biology Department
>>>>> University of Leicester
>>>>>
>>>>> http://www2.le.ac.uk/departments/biology/people/mallon
>>>>>
>>>>>
>>>>> 	[[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list