[BioC] DEXSeq package - read.HTSeqCount function error

Matteo Carrara carrara.matt at gmail.com
Fri Feb 1 14:26:32 CET 2013


Dear Alejandro,

thank you for the quick reply. After you mentioned the coherence of
the input, I dug more in it. It looks like the problem was quite
trivial: I accidentally left the "paired" option of the counting
script to the default, event though my datasets were all paired-end. I
have run the script again with the correct parameters and the loading
ends successfully.

Your question about the samples is far from being unrelated, since it
has impact on the next steps of the analysis.
When I wrote the message I was just testing the function, so I limited
the loading to a single dataset. The real design comprises two
different conditions with two biological replicates, each of them with
two technical replicates, for a total of four datasets per condition.
I decided to use DEXSeq exactly to assess the differential exon usage

Thank you again.

Best,
--
Matteo Carrara
PhD Student in Complex Systems for Life Sciences
Department of Biotechnology and Health Sciences
MBC - Molecular Biotechnology Center
via Nizza, 52 Torino
ITALY



On Tue, Jan 29, 2013 at 6:49 PM, Alejandro Reyes
<alejandro.reyes at embl.de> wrote:
>
> Dear Matteo Carrara,
>
> Could you add to this e-mail the first 10 and last 10 lines of your input files (both counts and annotation files produced by the python scripts)?
>
> On unrelated topics, I noticed that you have only one sample, what exactly do you want to do with DEXSeq in this case?
> Note that DEXSeq is designed to test for differences in exon usage between different conditions with replicates.
>
> Best wishes,
> Alejandro Reyes
>
>
>> Hello,
>>
>> I have been trying to learn how to perform a differential expression
>> analysis of RNA-seq data using the DEXSeq package lately and I encountered
>> an unexpected behaviour in the function read.HTSeqCount: the function fails
>> to load the file obtained from the python script "dexseq_count.py" with the
>> following error message:
>>
>> Error in strsplit(rownames(dcounts), ":") : non-character argument
>>
>> I would really appreciate any pointers that might help me correct my code
>> or my input files.
>>
>> Here is what I have done:
>> - downloaded the mm9 GTF gene set from www.ensembl.org and run the script
>> "dexseq_prepare_annotation.py"
>> - mapped my raw RNA-seq reads on the mm9 genome using tophat, converting
>> the output in sorted SAM format
>> - run the script "dexseq_count.py" using the "flattened" GTF and the SAM
>> file obtained before
>> - loaded the dataset in R using the function read.HTSeqCount() as following:
>>
>> --------------------------------------
>>>
>>> library(DEXSeq)
>>> wt<-read.HTSeqCount("./wt_mapped.counts", "WT",
>>
>> flattenedfile="./flattened_mm9.gtf")
>>
>>
>> Error in strsplit(rownames(dcounts), ":") : non-character argument
>> --------------------------------------
>>
>> As far as I could understand, the "pasilla" package, used for the examples
>> in the vignette, provided a counts file under the name
>> "pasilla_gene_counts.tsv". Loading that file, however, results in the same
>> error message.
>>
>> All I could do was pinpointing the source of the error in the code of the
>> function, although that did not help me in finding a solution or a
>> workaround:
>> After creating the data frame "dcounts" storing the counts and setting the
>> row names, that same data frame is sub-set
>>
>>      dcounts <- dcounts[substr(rownames(dcounts), 1, 1) != "_",
>>          ]
>>
>> This code, however changes the object dcounts in such a way that the
>> "rownames()" function returns NULL. The next statement is then bound to
>> fail since it requires rownames(dcounts) to be a character or a vector of
>> characters:
>>
>>      genesrle <- sapply(strsplit(rownames(dcounts), ":"), "[[",
>>          1)
>>
>> I am running R 2.15.2  and DEXSeq  1.4.0 from Bioconductor version 2.11,
>> but I was able to reproduce this on the devel version of R (2013-01-22
>> r61734) using DEXSeq_1.5.6 from Bioconductor version 2.12.
>>
>> ---------------------------------
>>>
>>> sessionInfo()
>>
>> R version 2.15.2 (2012-10-26)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=C                 LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] BiocInstaller_1.8.3 DEXSeq_1.4.0        Biobase_2.18.0
>> [4] BiocGenerics_0.4.0
>>
>> loaded via a namespace (and not attached):
>> [1] biomaRt_2.14.0 hwriter_1.3    RCurl_1.95-3   statmod_1.4.16
>> stringr_0.6.2
>> [6] tools_2.15.2   XML_3.95-0.1
>> --------------------------------
>>
>> Thank you in advance for any help you can provide.
>> Best Regards,
>
>



--
Matteo Carrara



More information about the Bioconductor mailing list