[BioC] error in easyRNASeq

Nicolas Delhomme delhomme at embl.de
Fri Nov 9 09:54:48 CET 2012


Dear Mayte,

This message tell you what is happening:

> Some of your read coordinates are bigger than the chromosome sizes you provided. 

I.e. some of your reads in your bam file have coordinates that are outside the chromosomal range you define.

Moreover your version of easyRNASeq is fairly outdated, 1.2.5 versus 1.4.2 in the repository. If you get that last version and have a look at the vignette, you'll see how this error can be handled more easily now. I'll sketch it below:

## you do not need to load other libraries, 
## all the one you mentioned below (from Rsamtools to parallel) are loaded by easyRNASeq
library(easyRNASeq)

## if your bam file has a proper header, you do not need to precise the BSgenome library either

## note that your condition vector need to be named
BamPath<-'/MyBAMDirectoryBAM/'
fls = list.files(path=BamPath,recursive=TRUE, pattern="*all.accepted_hits.bam$", full=FALSE)
conditions<-c(A,A,A,B,B,B)
names(conditions) <- fls

## count the genes
## the chromosome size is extracted from the bam header.
## it might be that there are discrepancies between the chromosome naming 
## between your bam file and the annotation you retrieve from biomaRt. Check out the use case in the new vignette if that occurs.

CountsGenes_biomart <- easyRNASeq(
	filesDirectory=BamPath,
	filenames=fls, nbCore=5,
	organism='Mmusculus',

	## not necessary if in your bam file header
	## chr.sizes=as.list(seqlengths(Mmusculus)),

	## not necessary either, will be determined from your read alignment
	## readLength=50L,

	## it seems to me that you used tophat to align your reads, so
	## if it is the case, you'll probably have gapped alignments
	## in which case you want to use the following (which I leave commented as it is just a guess from me):
	## gapped = TRUE,

	annotationMethod="biomaRt",
	format="bam",
	count=c('genes'),
	outputFormat=c("edgeR"),
	summarization=c("geneModels"),
	conditions=conditions)

Hope this helps,

Nico

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------





On Nov 8, 2012, at 7:14 PM, Mayte Suarez-Farinas wrote:

> Dear All,
> 
> I am trying to use the easyRNASeq package with 6 RNAseq samples from mouse.
> Single end, unstranded protocol, 50bp reads. I followed the vignettes and wrote this short code:
> 
> library("easyRNASeq")
> library(Rsamtools)
> library(DESeq)
> library(edgeR)
> library(GenomicRanges)
> library(parallel)
> library(BSgenome.Mmusculus.UCSC.mm10)
> 
> BamPath<-'/MyBAMDirectoryBAM/'
> fls = list.files(path=BamPath,recursive=TRUE, pattern="*all.accepted_hits.bam$", full=FALSE)
> conditions<-c(A,A,A,B,B,B)
> 
> CountsGenes_biomart<- easyRNASeq(filesDirectory=BamPath,
> +   filenames=fls, nbCore=5,
> +   organism='Mmusculus',
> +   chr.sizes=as.list(seqlengths(Mmusculus)),
> +   readLength=50L,
> +            annotationMethod="biomaRt",
> +            format="bam",
> +            count=c('genes'),
> +            outputFormat=c("edgeR"),
> +            summarization=c("geneModels"),
> +            conditions=conditions)
> 
> But I get the following Error:
> 
> Error in fetchCoverage(obj, format = format, filename = file, filter = filter,  :
>  Some of your read coordinates are bigger than the chromosome sizes you provided. Aborting!
> In addition: Warning messages:
> 1: In easyRNASeq(filesDirectory = BamPath, filenames = fls, nbCore = 5,  :
>  You enforce UCSC chromosome conventions, however the provided chromosome size list is not compliant. Correcting it.
> 2: In easyRNASeq(filesDirectory = BamPath, filenames = fls, nbCore = 5,  :
>  There are 5478 synthetic exons as determined from your annotation that overlap! This implies that some reads will be counted more than once! Is that really what you want?
> 3: In fetchCoverage(obj, format = format, filename = file, filter = filter,  :
>  You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it.
> 
> Can some one shed some light on the msitake I am making here?? sessionInfo is included bellow,
> 
> Thanks in advance
> 
> Mayte Suarez-Farinas
> 
> 
> 
> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
> 
> other attached packages:
> [1] BSgenome.Dmelanogaster.UCSC.dm3_1.3.17
> [2] RnaSeqTutorial_0.0.10
> [3] BSgenome.Mmusculus.UCSC.mm10_1.3.17
> [4] easyRNASeq_1.2.5
> [5] ShortRead_1.14.4
> [6] latticeExtra_0.6-24
> [7] RColorBrewer_1.0-5
> [8] lattice_0.20-10
> [9] Rsamtools_1.8.6
> [10] DESeq_1.8.3
> [11] locfit_1.5-8
> [12] BSgenome_1.24.0
> [13] GenomicRanges_1.8.13
> [14] Biostrings_2.24.1
> [15] IRanges_1.14.4
> [16] edgeR_2.6.12
> [17] limma_3.12.3
> [18] biomaRt_2.12.0
> [19] Biobase_2.16.0
> [20] genomeIntervals_1.12.0
> [21] BiocGenerics_0.2.0
> [22] intervals_0.13.3
> [23] BiocInstaller_1.4.9
> 
> loaded via a namespace (and not attached):
> [1] affy_1.34.0           affyio_1.24.0         annotate_1.34.1
> [4] AnnotationDbi_1.18.4  bitops_1.0-4.1        DBI_0.2-5
> [7] gcrma_2.28.0          genefilter_1.38.0     geneplotter_1.34.0
> [10] grid_2.15.1           hwriter_1.3           preprocessCore_1.18.0
> [13] RCurl_1.95-1.1        RSQLite_0.11.2        simpleaffy_2.32.0
> [16] splines_2.15.1        stats4_2.15.1         survival_2.36-14
> [19] tools_2.15.1          XML_3.95-0.1          xtable_1.7-0
> [22] zlibbioc_1.2.0
>> 
> 
> Mayte Suarez-Farinas
> Research Assistant Professor,  Laboratory of Investigative Dermatology
> Biostatistician,  Center for Clinical and Translational Science
> The Rockefeller University
> 1230 York Ave, Box 178,
> New York, NY, 10065
> Phone: +1(212) 327-8213
> Fax:      +1(212) 327-8232
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list