[BioC] get synthetic exon dataset with easyRNASeq
delhomme at embl.de
Fri Oct 12 09:36:32 CEST 2012
I've Cced the Bioc Mailing list in case this is of interest to others.
On Oct 11, 2012, at 6:15 PM, Meritxell Oliva wrote:
> Hi Nicolas,
> I am an easyRNASeq "newbie" user.
> First of all, congratulations for the development of the pipeline: so far it's one of the best R libraries I have found to deal with RNASeq data, as it tries to tackle problematic issues such as unique read-exon count assignment and also wraps the normalization packages (DESeq, edgeR), so you get all you need in one go. Thanks!
Thanks, that's nice to hear. Let me know as well whenever your encounter problems of think of new features!
> I do have a question: I would like to create a non-redundant, synthetic exon dataset, using the Ensembl68 gene models. From what I understand from the manual, when using easyRNASeq() if you summarize your counts by counts=gene,summarization=geneModels, this synthetic exon dataset is generated in order to create unique read-exon correspondances. This is what I do, and I store the object as RNASeq object, to preserve the genomic annotation. However, the annotation that I get if I apply the function genomicAnnotation() to this object, is the original one from Ensembl, with redundant exons shared between transcripts. I would like to get the synthetic exon dataset, to select unique coding regions for each gene transcript.
> How can I get this dataset? My ultimate goal is to perform gene expression differential analysis at gene, transcript and exon level. First one is solved, and I want to find the best way to do perform the latter ones.
At the moment it's still a dual step process, but I plan on making that easier. You first need to run easyRNASeq(counts=gene,summarization=geneModels,etc...) and asking to get an "RNAseq" outputFormat: rnaSeq <- easyRNASeq(counts=gene,summarization=geneModels,outputFormat="RNAseq",etc...). This will give you an object of the class RNAseq that contains the geneModel annotation accessible through geneModel(rnaSeq). That's a RangedData object containing the synthetic exon, although it is still redundant for genes located on opposite strands. So if you're not using stranded RNAseq data, you need to do some more filtering.
> Can you help me?
Hope this did, let me know if not,
> Meritxell Oliva
> PhD student
> IBB (Biotechnology and Biomedicine Institute)
> Comparative and Functional Genomics group
> Campus Universitari - 08193 Bellaterra Cerdanyola del Vallès - Barcelona
More information about the Bioconductor