[BioC] Getting counts for previously undetected transcripts and genes with easyRNASeq: comparison to Cufflinks

Alex Gutteridge alexg at ruggedtextile.com
Thu Dec 13 17:12:16 CET 2012


On 13.12.2012 15:55, Richard Friedman wrote:
> On Dec 4, 2012, at 10:54 AM, Richard Friedman wrote:
>
>> On Dec 4, 2012, at 5:27 AM, Nicolas Delhomme wrote:
>>>
>>>
>>> As I said, I have a very similar setup, but completely de-novo. 
>>> I've been (still am) testing several approaches:
>>>
>>> 1) running TopHat/Cufflinks/Cuffmerge (cuffmerge to get the 
>>> exon/gene GFF) and from that I go back to the original alignments by 
>>> tophat and use these as input together with the GFF for easyRNASeq. I 
>>> then get my DESeq/edgeR output and proceed in R.
>>
>
>
>
>
> Dear Nico and list,
>
> 	I tried using the cuffmerge gtf file as the gff file in TopHat as
> you suggest and successfully
> generated a bam file but easyRNASeq could not read it. (Although
> TopHat/Cufflinks are not bioconductor
> programs, it is desirable that their output should be readable by
> bioconductor programs so that
> I hope that including a TopHat command does not go beyond the scope
> of this list).

You'll kick yourself, but it doesn't look like you put quotes around 
the file name?

>
> To realign the reads using the gtf file generated by cufflinks I used
>
> tophat -p 6 -G
> /Documents/clients/phyllis/humanlearndata/merged_asm/merged.gtf -o
> SRR490225_it_thout genome SRR490225.fastq
>
> where merged,gtf is the gtf file generated by cufflinks.
>
> Thene here is a record of my easyRNASeq session;
>
> 
> ##############################################################################################
>
> R version 2.15.2 (2012-10-26) -- "Trick or Treat"
> Copyright (C) 2012 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
>   Natural language support but running in an English locale
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
> [R.app GUI 1.53 (6335) i386-apple-darwin9.8.0]
>
> [Workspace restored from /Users/friedman/.RData]
> [History restored from /Users/friedman/.Rapp.history]
>
>> library(easyRNASeq)
> Loading required package: parallel
> Loading required package: genomeIntervals
> Loading required package: intervals
> Loading required package: BiocGenerics
>
> Attaching package: ‘BiocGenerics’
>
> The following object(s) are masked from ‘package:stats’:
>
>     xtabs
>
> The following object(s) are masked from ‘package:base’:
>
>     anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find,
> get, intersect,
>     lapply, Map, mapply, mget, order, paste, pmax, pmax.int, pmin,
> pmin.int, Position,
>     rbind, Reduce, rep.int, rownames, sapply, setdiff, table, tapply,
> union, unique
>
> Loading required package: Biobase
> Welcome to Bioconductor
>
>     Vignettes contain introductory material; view with
> 'browseVignettes()'. To cite
>     Bioconductor, see 'citation("Biobase")', and for packages
> 'citation("pkgname")'.
>
> Loading required package: biomaRt
> Loading required package: edgeR
> Loading required package: limma
> Loading required package: Biostrings
> Loading required package: IRanges
>
> Attaching package: ‘IRanges’
>
> The following object(s) are masked from ‘package:intervals’:
>
>     expand, reduce
>
>
> Attaching package: ‘Biostrings’
>
> The following object(s) are masked from ‘package:intervals’:
>
>     type
>
> Loading required package: BSgenome
> Loading required package: GenomicRanges
> Loading required package: DESeq
> Loading required package: locfit
> locfit 1.5-8 	 2012-04-25
>
> Attaching package: ‘locfit’
>
> The following object(s) are masked from ‘package:GenomicRanges’:
>
>     left, right
>
> Loading required package: lattice
>
> Attaching package: ‘DESeq’
>
> The following object(s) are masked from ‘package:limma’:
>
>     plotMA
>
> Loading required package: Rsamtools
> Loading required package: ShortRead
> Loading required package: latticeExtra
> Loading required package: RColorBrewer
> Warning messages:
> 1: replacing previous import ‘coerce’ when loading ‘intervals’
> 2: replacing previous import ‘initialize’ when loading ‘intervals’
>> library(BSgenome.Hsapiens.UCSC.hg19)
>> library(Rsamtools)
>>
>> chr.sizes=seqlengths(Hsapiens)
>> chr.sizes
>
>                 chr1                  chr2                  chr3
>            chr4
>             249250621             243199373             198022430
>        191154276
>
> ……..
>
>
>> bamfiles=dir(getwd(),pattern="*\\.bam$")
>> bamfiles
> [1] "SRR490224it.bam" "SRR490225it.bam"
>
>> indexBam(SRR490224it.bam)
> Error in indexBam(SRR490224it.bam) :
>   error in evaluating the argument 'files' in selecting a method for
> function 'indexBam': Error: object 'SRR490224it.bam' not found
>
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets
> methods   base
>
> other attached packages:
>  [1] BSgenome.Hsapiens.UCSC.hg19_1.3.19 easyRNASeq_1.4.2
>  [3] ShortRead_1.16.2                   latticeExtra_0.6-19
>  [5] RColorBrewer_1.0-5                 Rsamtools_1.10.2
>  [7] DESeq_1.10.1                       lattice_0.20-10
>  [9] locfit_1.5-8                       BSgenome_1.26.1
> [11] GenomicRanges_1.10.5               Biostrings_2.26.2
> [13] IRanges_1.16.4                     edgeR_3.0.4
> [15] limma_3.12.1                       biomaRt_2.14.0
> [17] Biobase_2.18.0                     genomeIntervals_1.14.0
> [19] BiocGenerics_0.4.0                 intervals_0.13.3
>
> loaded via a namespace (and not attached):
>  [1] annotate_1.34.1      AnnotationDbi_1.20.2 bitops_1.0-4.1
> DBI_0.2-5
>  [5] genefilter_1.38.0    geneplotter_1.34.0   grid_2.15.2
> hwriter_1.3
>  [9] RCurl_1.91-1         RSQLite_0.11.1       splines_2.15.2
> stats4_2.15.2
> [13] survival_2.36-14     XML_3.9-4            xtable_1.7-0
> zlibbioc_1.2.0
>
>
>> ls()
> [1] "bamfiles"  "chr.sizes"
>
> 
> ##############################################################################################
>
> Any suggestions would be appreciated.
>
> Thanks and best wishes,
> Rich
> Richard A. Friedman, PhD
> Associate Research Scientist,
> Biomedical Informatics Shared Resource
> Herbert Irving Comprehensive Cancer Center (HICCC)
> Lecturer,
> Department of Biomedical Informatics (DBMI)
> Educational Coordinator,
> Center for Computational Biology and Bioinformatics (C2B2)/
> National Center for Multiscale Analysis of Genomic Networks (MAGNet)/
> Columbia Initiative in Systems Biology
> Room 824
> Irving Cancer Research Center
> Columbia University
> 1130 St. Nicholas Ave
> New York, NY 10032
> (212)851-4765 (voice)
> friedman at cancercenter.columbia.edu
> http://cancercenter.columbia.edu/~friedman/
>
> In memoriam, Ray Bradbury
>
>
>
>
>
>
>
> 	[[alternative HTML version deleted]]

-- 
Alex Gutteridge



More information about the Bioconductor mailing list