[BioC] DEXSeq: problem with dexseq_prepare_annotation.py

Stephen Turner vustephen at gmail.com
Tue Apr 17 20:00:12 CEST 2012


Alejandro, Simon, Wolfgang, et al.:

I'm trying to use the dexseq_prepare_annotation.py script to parse the
UCSC hg18 genes.gtf GTF file included with the Illumina igenomes
packages (http://tophat.cbcb.umd.edu/igenomes.html). I'm getting an
error:

Traceback (most recent call last):
  File "/home/sdt5z/bin/dexseq_prepare_annotation.py", line 93, in <module>
    raise ValueError, "Same name found on two chromosomes: %s, %s" % (
str(l[i]), str(l[i+1]) )
ValueError: Same name found on two chromosomes: <GenomicFeature:
exonic_part 'CFB' at chr6_qbl_hap2: 3167392 -> 3167602 (strand '+')>,
<GenomicFeature: exonic_part 'CFB' at chr6_cox_hap1: 3359983 ->
3360325 (strand '+')>

I'm guessing this is because the same gene name is found in two
separate places. I'm not entirely sure what these two chromosomal
segments refer to, but I removed them from the GTF file and the python
script threw another error:

Traceback (most recent call last):
  File "/home/sdt5z/bin/dexseq_prepare_annotation.py", line 91, in <module>
    assert l[i].iv.end <= l[i+1].iv.start, str(l[i+1]) + " starts too early"
AssertionError: <GenomicFeature: exonic_part 'HIST2H3C+HIST2H3A' at
chr1: 148079388 -> 148078883 (strand '-')> starts too early

I'm really unsure what to make of this or how to fix it. The script
works without issues with the Ensembl GTF. Any help would be greatly
appreciated.

Stephen

-----------------------------------------
Stephen D. Turner, Ph.D.
Bioinformatics Core Director
University of Virginia School of Medicine
bioinformatics.virginia.edu



More information about the Bioconductor mailing list