[BioC] Getting counts for previously undetected transcripts and genes with easyRNASeq: comparison to Cufflinks

Ryan C. Thompson rct at thompsonclan.org
Mon Dec 3 23:38:35 CET 2012


One trick I have used for incompletely-annotated genomes is to run my 
sequencing data through tophat & cufflinks to get a set of transcripts 
that covers all the mapped reads (while also providing Cufflinks with 
whatever annotation is already available. Then I use cuffmerge to both 
merge transcripts from multiple samples *and* associate Cufflinks 
transcript IDs (e.g. "XLOC_000001", etc.) with their corresponding IDs 
in the prior annotation. Then I can do my read counting on the 
Cufflinks assembly, but still get meaningful gene IDs for annotated 
genes.

On Mon 03 Dec 2012 01:03:49 PM PST, Richard Friedman wrote:
> Dear Bioconductor List,
>
> 	I am working through the  easyRNAseq use case to learn how
> to obtain counts in RNASeq experiments for further analysis.
> The example starts with BAM files and converts BAM files to Exons,
> transcripts and genes using geneModels.
>
> Does using geneModels only assemble previously annotated transcripts and
> genes OR can it find new ones if present?
>
> If it can find new ones how well does it do this in comparison to Cufflinks?
>
> If it cannot find new ones - is there a way to get counts (as distinct from fpkm
> values) for genes and transcripts from  cufflinks and
> relate them to existing annotation where they correspond and present them
> as non-previosuly annotated ones when they don't correspond?
>
> Can easyRNASeq be used for this purpose?
> Can anyone recommend a tool that can be used for this purpose?
>
> My goal is to get a set of counts that can be input into CQN and then edgeR.
> I wish to use TopHat/Cufflinks to get the Exons, transcripts, and genes including
> novel spliced variants but I am persuaded CQN is a better way to normalize than
> FPKM and edgeR is a better way to analyze differential expression than
> Cuffdiff.
>
> I would appreciate any advice.
>
> Thanks and best wishes,
> Rich
> Richard A. Friedman, PhD
> Associate Research Scientist,
> Biomedical Informatics Shared Resource
> Herbert Irving Comprehensive Cancer Center (HICCC)
> Lecturer,
> Department of Biomedical Informatics (DBMI)
> Educational Coordinator,
> Center for Computational Biology and Bioinformatics (C2B2)/
> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/
> Columbia Initiative in Systems Biology
> Room 824
> Irving Cancer Research Center
> Columbia University
> 1130 St. Nicholas Ave
> New York, NY 10032
> (212)851-4765 (voice)
> friedman at cancercenter.columbia.edu
> http://cancercenter.columbia.edu/~friedman/
>
> In memoriam, Ray Bradbury
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list