[BioC] HT-Seq count and GTF

Steve Lianoglou lianoglou.steve at gene.com
Wed Jul 17 17:11:11 CEST 2013


Hi Jose,

On Wed, Jul 17, 2013 at 3:26 AM, Jose M Garcia Manteiga
<garciamanteiga.josemanuel at hsr.it> wrote:
>>
>> Dear Simon,
>> I am using HT-Seq count to obtain counts on sorted bam files based on gene-id as you recommended, but I had a problem.
>>
>> I started by using the GTF file from UCSC ensemble genes (build 72), which seemed to work fine and produced a file of counts per bam file. However I realised looking at one of our genes of interest, which "had" to be expressed, that the counts were 0.
[snip]

This does not answer your question, but perhaps you might like to try
an alternative "all-bioc" approach to counting reads over genes. This
is outlined in the vignette to the parathyroidSE data package here:

http://bioconductor.org/packages/release/data/experiment/vignettes/parathyroidSE/inst/doc/parathyroidSE.pdf

Look at section 4 (counting reads in genes), which uses the
GenomicRanges::summarizeOverlaps method.

You should also read through how the different summarizeOverlaps
parameter affect the total number of reads that are tallied per
"feature," which is outlined here:

http://bioconductor.org/packages/release/bioc/vignettes/GenomicRanges/inst/doc/summarizeOverlaps.pdf

HTH,
-steve

--
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech



More information about the Bioconductor mailing list