[BioC] EasyRNASeq issue (missing bai files)

Fri Jul 20 12:27:42 CEST 2012

Hi Nico,

Thanks for your help - I hadn't replied back yet as I wanted to try to 
index them bam file first and see if other problems arised (as they 
did...!).

I changed the arguments with the suggestions you made for the 
conditions and output, however the chrsize didn't really work, now I've 
got this error " Error in .readGffGtf(filename = filename, 
ignoreWarnings = ignoreWarnings,  :
  The filename you provided:  does not exists
In addition: Warning message:
In easyRNASeq(filesDirectory = getwd(), filenames = 
c("J05_orig_genome.sorted.bam"),  :
  You enforce UCSC chromosome conventions, however the provided 
chromosome size list is not compliant. Correcting it."

I guess I also am not doing something properly with the gff file, given 
the error, but since it didn't spit out any error about this before I 
don't know why this happens now...
I tried to use library(BSgenome.Dmelanogaster.UCSC.dm3) to try to solve 
the chrsize issue but maybe that's what generated some incompatibility?

Let me paste here my current code to see if you spot a problem:

setwd("/Users/jbeira/Desktop/bams")
library("easyRNASeq")
count.table <- easyRNASeq(
filesDirectory=getwd(),
filenames=c("J05_orig_genome.sorted.bam"),
organism="Dmelanogaster",
readLength=75L,
chr.sizes=as.list(seqlengths(Dmelanogaster)),
annotationMethod="gff",
annotationFile=system.file("dmel-all-r5.45.gff"),
format="bam",
conditions=c("J05_orig_genome.sorted.bam"="wt"),
count="exons",
outputFormat="RNAseq")

--- You told me I didn't need the chrsize with the developer's version, 
but because of a proxy permission here at the institute I am not able 
to install the latest version of the package now, so if there's any way 
around this by defining the variable, that would be a good help.

Sorry if this is getting more complicated, thanks for your help!

Best
Jorge

On Wed Jul 18 16:49:24 2012, Nicolas Delhomme wrote:
> Dear Jorge,
>
> I've Cc'ed the Bioc mailing list as it can be of help to others.
>
> You're missing the index (.bam.bai) files for your bam files; i.e. you need to run 'samtools index TTGR1.bam' on the command line in your /Users/jbeira/Desktop/bams directory to create the TTGR1.bam.bai file.
>
> You may as well want to use the indexBam function of the Rsamtools package. This package is required by easyRNASeq, so the following would index all your bam files in your bams directory:
>
> library(easyRNASeq)
>
> setwd("/Users/jbeira/Desktop/bams")
>
> indexBam(files=dir(".",pattern="*\\.bam$"))
>
> Note that it is important for your bam files to be sorted first (check the samtools webpage for more info: http://samtools.sourceforge.net/)
>
> In your easyRNASeq call, you're missing the "conditions" argument that describes your samples (e.g. tumor, control). This is necessary if you want to produce a DESeq output. The conditions should be a named vector, the names being the actual filenames: e.g. in your case conditions=c("TTGR1.bam"="tumor"). Asking for a DESeq output with a single sample is not going to work, but I suppose you've got more than one sample :-); you can provide all of them at once in the filenames argument or use the pattern argument instead (as in the indexBam for example).
>
> Finally, if you are using easyRNASeq version 1.3.8 (the development version), you do not need to precise the chr.sizes argument, provided your bam files have an header (which they most probably have). The readLength would as well be determined from your data, but it does not harm providing it. Moreover, if the computer you're running on as enough memory and CPUs, you can process the input file in parallel using the nbCore argument (as of easyRNASeq version 1.3.8). The memory usage is roughly the size of the BAM files, i.e. if you have 12GB RAM, you could proceed 4 x 3GB bam files in parallel (in an ideal world, in practice I would go for 3 just in case)
>
> Cheers,
>
> Nico
>
> ---------------------------------------------------------------
> Nicolas Delhomme
>
> Genome Biology Computational Support
>
> European Molecular Biology Laboratory
>
> Tel: +49 6221 387 8310
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
> ---------------------------------------------------------------
>
>
>
>
>
> On Jul 18, 2012, at 5:24 PM, Jorge Beira wrote:
>
>> Dear Nicolas,
>>
>> I am trying to use the easyRNASeq package to obtain read counts so that I can proceed with my analysis for DESeq. However I must be doing something wrong in giving it the right arguments, since it gives me errors like
>> " Error in easyRNASeq(filesDirectory = getwd(), filenames = c("TTGR1.bam"),  :
>>   Index files (bai) are required. They are missing for the files: /Users/jbeira/Desktop/bams/TTGR1.bam "
>>
>> Info: I have my bam files in a folder "bams" in my Desktop, and I also added the Drosophila .gff file on the same directory. So the whole code I'm trying to run is:
>>
>>
>> setwd("~/Desktop/bams")
>> library("easyRNASeq")
>> count.table <- easyRNASeq(
>> filesDirectory=getwd(),
>> filenames=c("TTGR1.bam"),
>> organism="Dmelanogaster",
>> chr.sizes=as.list(seqlengths(Dmelanogaster)),
>> readLength=75L,
>> annotationMethod="gff",
>> annotationFile=system.file("dmel-all-r5.45.gff"),
>> format="bam",
>> count="exons",
>> outputFormat="DESeq")
>>
>>
>> If you could help me spotting where the problem is, it'd be great. Thanks a lot!
>>
>> Best wishes
>>
>> Jorge Beira
>> National Institute for Medical Research
>> and University College London, UK
>
>