[BioC] Question about CSAMA10 "Lab-8-RNAseqUseCase.pdf" tutorial on bioconductor website.
Simon Anders
anders at embl.de
Mon Sep 27 18:31:48 CEST 2010
Hi
I was never too happy about what we wrote there in Lab 8. Paul has
pointed out one major issue. The other is overlapping genes: Standard
RNA-Seq does not recover information about which strand a transcript is
from, and in more crowded genomes, it does happen not that rarely that
exons of two different genes on opposite strands overlap.
The code in the lab does not address this. If genes A and B overlap,
then every read that maps onto this overlap will be counted for both
genes. If now gene A is differentially expressed and gene B is not, then
the extra counts from gene A that get counted for gene B as well might
cause gene B to be called differentially expressed, too.
All this is not likely to have large effects on results, but it would be
nicer to do it properly. As you have already noticed, it is not exactly
trivial to code something like this in a correct and efficient manner in
R. At least I think so. I'm sure the IRanges gurus on the list will now
jump on me with examples on how easy it would have been, but I found it
much easier to code this in Python. The script I made for this purpose
is available at
http://www-huber.embl.de/users/anders/HTSeq/doc/count.html
It is actually part of a larger framework to make coding such stuff in
Python easy. Have a look: http://www-huber.embl.de/users/anders/HTSeq
Simon
+---
| Dr. Simon Anders, Dipl.-Phys.
| European Molecular Biology Laboratory (EMBL), Heidelberg
| office phone +49-6221-387-8632
| preferred (permanent) e-mail: sanders at fs.tum.de
More information about the Bioconductor
mailing list