[BioC] Error Using DESeq with HTSeq-Count

Devon Ryan dpryan at dpryan.com
Mon Jan 27 16:28:14 CET 2014


Hi Veronica,

The SAM file optionally output by htseq-count is mostly for debugging. You need, instead, to load the counts that are printed to the screen.

If your original command was something of the form:
samtools view alignments.bam | htseq-count -o alignment.htseq.sam - something.gff

then simply do instead:
samtools view alignments.bam | htseq-count - something.gff > alignment.counts

The alignment.counts file would then be appropriate for loading into R.

@Michael et alii, maybe you guys could update the htseq webpage and such to make this more explicit. I've seen a number of people (mostly on seqanswers) have this same misunderstanding.

Regards,
Devon

____________________________________________
Devon Ryan, Ph.D.
Email: dpryan at dpryan.com
Tel: +49 (0)178 298-6067
Molecular and Cellular Cognition Lab
German Centre for Neurodegenerative Diseases (DZNE)
Ludwig-Erhard-Allee 2
53175 Bonn, Germany

On Jan 27, 2014, at 4:04 PM, Xiaoyu Liang wrote:

> Hi Mike,
> 
> Thank you for the respond, sorry I didn't include those information.
> 
> I pasted 3 lines from the HTSeq-count output sam file
> 
> PLATYPUS_627RLAAXX:4:001:01065:10528    163     chrX    23803314
> 50      51M     =       23803885        622
> GCCATGGCTACTTGTTTCTGTAATACATGCATGTGTGTTTTTTAAAACCTA
> T`cccc^YacbL`TTTa\TTbbbYYcL\c`^cYcc_c^`cc]b]Y\ccbY^     AS:i:0  XN:i:0
> XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:51 YT:Z:UU NH:i:1  XF:Z:no_feature
> PLATYPUS_627RLAAXX:4:001:01065:11031    97      chr12   56552177
> 50      12M279N28M875N11M       =       56553900        1868
> GCAGTCAAGATGTGTGACTTCACCGAAGACCAGACCGCAGAGTTCAAGGAG
> cc`\\dd^dT^ccca`Y`]YYa```TccTab\YL_a`YK`]LK\_]U]]UY     AS:i:0  XM:i:0
> XO:i:0  XG:i:0  MD:Z:51 NM:i:0  XS:A:+  NH:i:1  XF:Z:MYL6
> PLATYPUS_627RLAAXX:4:001:01065:11031    145     chr12   56553900
> 50      33M94N18M       =       56552177        -1868
> GTGCTGAAATCCGGCATGTTCTTGTCGCACTGGGTGAGAAGATGACAGAGG
> a_\L`]TaTTbZU[LZTaRYaTYb`YIZSTZ_]UYQS]]\\`T[``TbT^b     AS:i:-6 XM:i:1
> XO:i:0  XG:i:0  MD:Z:26A24      NM:i:1  XS:A:+  NH:i:1  XF:Z:MYL6
> 
> In the R package,
> I have a data frame looks like:
>> table
>  samplename    filename condition
> 1      NML-1 NML-1-htout       NML
> 2      NML-2 NML-2-htout       NML
> 3      LMP-1 LMP-1-htout       LMP
> 4      LMP-2 LMP-2-htout       LMP
> 
> Then I call the following in R
> cds = newCountDataSetFromHTSeqCount(table, directory=".")
> 
> 
> Veronica
> 
> 
> On Mon, Jan 27, 2014 at 9:39 AM, Michael Love
> <michaelisaiahlove at gmail.com>wrote:
> 
>> Hi Veronica,
>> 
>> Could you paste the head of the htseq count table file? Maybe there is
>> some clue as to what is going wrong.
>> 
>> Also it's a good idea to include all your code (command line and R) to
>> help package maintainers diagnose what might be going on.
>> 
>> Mike
>> On Jan 26, 2014 6:16 PM, "Xiaoyu Liang" <veronica.xiaoyu at gmail.com> wrote:
>> 
>>> Hi all,
>>> 
>>> I was trying to use DESeq with the count table obtained from HTSeq-count.
>>> When I used the function "newCountDataSetFromHTSeqCount", it gave me an
>>> error complaining some lines of the count table do not have 24 columns.
>>> 
>>> The error message looks like:
>>> 
>>> "Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
>>> na.strings,  :
>>>  line 5 did not have 24 elements"
>>> 
>>> I checked the HTSeq-count table results, not only a couple of line don't
>>> have 24 columns, most of them don't have. So I can't skip those lines.
>>> 
>>> Is there anything wrong with HTSeq-count results? Would anybody give me
>>> any
>>> suggestions?
>>> 
>>> Thank you in advance,
>>> Veronica
>>> 
>>>        [[alternative HTML version deleted]]
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> 
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list