[BioC] bam file in easyRNASeq

Nicolas Delhomme delhomme at embl.de
Fri Dec 14 09:39:34 CET 2012


Hi Silav,

On 14 Dec 2012, at 07:04, Silav Bremos wrote:

> Nico, Thank you.
> filenames="accepted_hits.bam" was the problem. It worked with just that correction.
> Not necessary to install R-2.15.2. Just a BiocUpgrade to Bioconductor version 2.11 (BiocInstaller 1.8.3)
> was sufficient. sessionInfo() showed that easyRNASeq is 1.4.2.  easyRNASeq and all the packages supporting
> the easyRNASeq lib were built under 2.15.1 or 2.15.2.

Great.

> -- Can you tell me what readLength is used? As readLength() can be left out.

It is a remnant of earlier implementation. It is now derived from the data.

> -- Is "geneModels" only option for summarization?

Yes and No. There is another one, but I would not advise using it. I'm currently discussing with Valerie Obenchain (Bioconductor) and we plan to come up with a single counting method shared by easyRNASeq, summarizeOverlaps, etc. 
Can you precise what you would be interested in as summarization methods?

> -- I am going to see how this compares with count table from python HTSeq package.

You don't need to do that. If you search the mailing list archive, Mark Robison reported an extensive study about this (the title of the thread was "counting RNA-seq reads in R/BioC").

Cheers,

Nico

> --------------------------------------------
> countTable <- easyRNASeq(filesDirectory=getwd(),
>     filenames="accepted_hits.bam",
>     organism="Dmelanogaster",
>     #chr.sizes=as.list(seqlengths(Dmelanogaster)),
>     #readLength=36L,
>     annotationMethod="gtf",
>     annotationFile="/myrnaseqB/genes.gtf",
>       format="bam", 
>     count="genes", summarization="geneModels"
>     )    
> > head(countTable)
>               accepted_hits.bam
> "FBgn0000003"              1632
> "FBgn0000008"              1156
> "FBgn0000014"               198
> "FBgn0000015"               129
> "FBgn0000017"              5270
> "FBgn0000018"               616
> > tail(countTable)
>               accepted_hits.bam
> "FBgn0261570"              5733
> "FBgn0261571"                 3
> "FBgn0261572"                 5
> "FBgn0261573"              5532
> "FBgn0261574"              7067
> "FBgn0261575"                14
> ---------------------------------------- 
> 
> From: Nicolas Delhomme <delhomme at embl.de>
> To: Silav Bremos <silavb at yahoo.com> 
> Cc: "bioconductor at r-project.org" <bioconductor at r-project.org> 
> Sent: Thursday, December 13, 2012 3:22 AM
> Subject: Re: [BioC] bam file in easyRNASeq
> 
> Hi Silav,
> 
> You need to give the BAM file as input (filenames="accepted_hits.bam") and not the BAM index file ("accepted_hits.bam.bai"). The bam index file is necessary though, but it's filename is deduced from the BAM file name.
> 
> You're using an older version of R/Bioc/easyRNASeq than the currently supported one. You should move R to 2.15.2 with Bioc 2.11 (easyRNASeq 1.4.2). I hadn't seen your comment about the IT before writing that, but you can very probably install that as your own user in your home directory. Can't you really install the latest R dmg package as a user (I haven't tried this myself)?
> 
> Here is how I do myself (on Mac too) using the terminal. You might need some system libraries to be installed, but they might be there if IT has R already installed on the system. If not (my case), I'm using macports (http://www.macports.org/) to get the necessary libraries (takes a little while to set up, but it works like a charm now). 
> 
> download the latest R from you CRAN mirror
> tar -zxf R-2.15.2.tar.gz
> mv R-2.15.2 R-2.15.2_SRC
> cd R-2.15.2_SRC
> ./configure --prefix=`pwd`/../R-2.15.2 --enable-R-shlib
> make
> make check
> make pdf
> make info
> make install
> make install-info
> make install-pdf
> 
> Then R is accessible as R-2.15.2/bin/R. Just adding R-2.15.2/bin to the PATH environment variable makes you using your R whenever you type R on the command line.
> HTH,
> 
> Nico
> ---------------------------------------------------------------
> Nicolas Delhomme
> Genome Biology Computational Support
> European Molecular Biology Laboratory
> Tel: +49 6221 387 8310
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
> ---------------------------------------------------------------
> 
> On 13 Dec 2012, at 07:42, Silav Bremos wrote:
> 
> > Hello list
> > 
> > rnaSeq <- easyRNASeq(filesDirectory=getwd(),
> >    filenames="accepted_hits.bam.bai",
> >    organism="Dmelanogaster",
> >    chr.sizes=as.list(seqlengths(Dmelanogaster)),
> >    readLength=36L,
> >    annotationMethod="gtf",
> >    annotationFile="/myrnaseqB/genes.gtf",
> >      format="bam", gapped=TRUE,
> >    count="exons",
> >    outputFormat="RNAseq")    
> > 
> > I keep getting this error:
> > Checking arguments... 
> > Error in easyRNASeq(filesDirectory = getwd(), filenames = "accepted_hits.bam.bai",  : 
> >  Index files (bai) are required. They are missing for the files: /myrnaseqB/C2_R2_thout/accepted_hits.bam.bai
> >  
> > even after 
> >> sortBam("accepted_hits.bam","accepted_hits") 
> >> indexBam(files="accepted_hits.bam")
> >> file.exists("/myrnaseqB/C2_R2_thout/accepted_hits.bam.bai")
> > [1] TRUE
> > 
> > Please help. I will not be able to upgrade or reinstall. IT dept does that on their schedule.
> > 
> > Silav
> > --------------------------------------------------------- 
> >> sessionInfo()
> > R version 2.15.0 (2012-03-30)
> > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> > 
> > locale:
> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> > 
> > attached base packages:
> > [1] parallel  stats    graphics  grDevices utils    datasets  methods  base    
> > 
> > other attached packages:
> >  [1] easyRNASeq_1.2.5      ShortRead_1.14.4      latticeExtra_0.6-24    RColorBrewer_1.0-5    
> >  [5] lattice_0.20-10        Rsamtools_1.8.6        DESeq_1.8.3            locfit_1.5-8          
> >  [9] BSgenome_1.24.0        GenomicRanges_1.8.13  Biostrings_2.24.1      IRanges_1.14.4        
> > [13] edgeR_2.6.12          limma_3.12.3          biomaRt_2.12.0        Biobase_2.16.0        
> > [17] genomeIntervals_1.12.0 BiocGenerics_0.2.0    intervals_0.13.3      
> > 
> > loaded via a namespace (and not attached):
> >  [1] annotate_1.34.1      AnnotationDbi_1.18.4 bitops_1.0-4.2      DBI_0.2-5            genefilter_1.38.0  
> >  [6] geneplotter_1.34.0  grid_2.15.0          hwriter_1.3          RCurl_1.95-3        RSQLite_0.11.2      
> > [11] splines_2.15.0      stats4_2.15.0        survival_2.37-2      tools_2.15.0        XML_3.95-0.1        
> > [16] xtable_1.7-0        zlibbioc_1.2.0      
> >     [[alternative HTML version deleted]]> 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 
> 



More information about the Bioconductor mailing list