[BioC] Problems easyRNASeq

Thu Apr 12 20:52:13 CEST 2012

Hi Steven,

Using transcriptome annotation only is something I haven't done yet, but that should not be problematic. 

I'll suppose you will have an alignment of your reads against your generated transcriptome, right?

If your transcripts are unique, i.e. there are no isoforms of each other, all you need to do to get a count table is to figure out how many times a given transcript has a read aligned to it, which is the information present in your BAM file.  You would not need the overhead of easyRNASeq for that. Reading in your bam file using the Rsamtools scanBam function (with the appropriate ScanBamParam parameters) and tabulating the query names should be pretty straightforward and give you what you need.

Now, if we assume that your transcripts are not unique, i.e. that you do have isoforms in your data, we need to do some additional processing and then easyRNAseq might come in handy to avoid counting reads several times. An important parameter in that case is how you'll decide to run your aligner. To ensure that reads can match to several isoforms, you'll need to allow multiple mapping. It would be interesting to estimate what's the highest number of isoforms you have and use that as a threshold for your aligner, i.e. neither to return only unique reads, nor too many. I would need to think a bit more on how to prepare the data for easyRNAseq, and if it makes sense to use it in that setup. And a data excerpt would help too in that case.

Let me know which is your situation before we take matter further,

Cheers,

Nico

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------

On 12 Apr 2012, at 10:07, Yates, Steven A wrote:

> Dear Sir/Madam
> 
> I am in the process of learning how to use the easyRNAseq package for
> bioconductor but have a question/problem. The problem is that the
> organisms I am working with do not have any comprehensive genome
> information (or any prior sequencing) in effect I will be creating a de
> novo transcriptome. Therefore there is no annotation file available for
> me to use, all I have is a list of transcripts. How will this work for
> this package? I am quite happy for the results to be reads per
> transcript etc, is it neccessary to create an annotation file (gff) for
> this purpose or not. I have created a gtf file using cufflinks, which should be ok?  #
> 
> The second problem I am encountering is the chrSizes. How do I get around the chromosome sizes problem???, if you have any advice it would be appreciated
> 
> many thanks
> 
> Steven Yates
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor