[BioC] Warnings with running easyRNASeq with easyRNASeq tutorial

Nicolas Delhomme delhomme at embl.de
Wed Jul 25 09:13:28 CEST 2012


Dear Richard,

The warnings are indeed there on purpose. In the first section of the easyRNASeq vignette, in the "usage" subsection, there's a paragraph entitled "Warnings" that explains their meaning. This paragraph is indeed missing from the RnaSeqTutorial vignette, I'll correct that, thanks for reporting it.

In the example, 2 kinds of warnings are raised:

1) warning about overlaps in the annotation:

Their goal is to make the user realize the danger of performing multiple counting if the annotation used are not adequate; i.e. assigning a sequence read to multiple features, e.g. to slightly different, yet overlapping exons. As this task can hardly be automated, it is left to the user to create an annotation set that would avoid such problems. I'm currently adding to the vignette a use case to do so.

2) warning about potential naming issue in the input file:

It is (sadly) very frequent that the sequencing facilities use different naming conventions for the chromosomes they report in the alignment files. It is therefore very frequent that the annotation provided to easyRNASeq uses different chromosome names than the alignment file. These warnings are there to inform you about this issue. In version 1.2.3, these warnings were not very sophisticated, this has improved in the development version of easyRNASeq.

I hope this clarifies your question. 

Thanks again for your feedback, I'll update the vignettes accordingly.

Cheers,

Nico

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------





On Jul 24, 2012, at 10:54 PM, Richard Friedman wrote:

> Dear Bioconductor List,
> 
> 	I am new to RNASeq. I am running the script that comes
> with easyRNAseq a command at a time to learn the program.
> I got warning messages. I am wondering if they are indicative of
> something being  wrong with my installation because I 
> did not expect warnings in a tutorial script. Here is my
> session,
> 
> ###################################################################################
> R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"
> Copyright (C) 2012 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> 
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
> 
>  Natural language support but running in an English locale
> 
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
> 
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
> 
> [R.app GUI 1.52 (6188) x86_64-apple-darwin9.8.0]
> 
> [History restored from /Users/friedman/.Rapp.history]
> 
>> getwd()
> [1] "/Users/friedman"
>> library("easyRNASeq")
> Loading required package: parallel
> Loading required package: genomeIntervals
> Loading required package: intervals
> Loading required package: BiocGenerics
> 
> Attaching package: ŒBiocGenerics‚
> 
> The following object(s) are masked from Œpackage:stats‚:
> 
>    xtabs
> 
> The following object(s) are masked from Œpackage:base‚:
> 
>    anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map,
>    mapply, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int,
>    rownames, sapply, setdiff, table, tapply, union, unique
> 
> Loading required package: Biobase
> Welcome to Bioconductor
> 
>    Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor,
>    see 'citation("Biobase")', and for packages 'citation("pkgname")'.
> 
> Loading required package: biomaRt
> Loading required package: edgeR
> Loading required package: limma
> Loading required package: Biostrings
> Loading required package: IRanges
> 
> Attaching package: ŒIRanges‚
> 
> The following object(s) are masked from Œpackage:intervals‚:
> 
>    reduce
> 
> 
> Attaching package: ŒBiostrings‚
> 
> The following object(s) are masked from Œpackage:intervals‚:
> 
>    type
> 
> Loading required package: BSgenome
> Loading required package: GenomicRanges
> Loading required package: DESeq
> Loading required package: locfit
> locfit 1.5-8 	 2012-04-25
> 
> Attaching package: Œlocfit‚
> 
> The following object(s) are masked from Œpackage:GenomicRanges‚:
> 
>    left, right
> 
> Loading required package: Rsamtools
> Loading required package: ShortRead
> Loading required package: lattice
> Loading required package: latticeExtra
> Loading required package: RColorBrewer
> Warning messages:
> 1: replacing previous import Œcoerce‚ when loading Œintervals‚ 
> 2: replacing previous import Œinitialize‚ when loading Œintervals‚
>> library("RnaSeqTutorial")
>> library(BSgenome.Dmelanogaster.UCSC.dm3)
>> 
>> library("easyRNASeq")
>> library("RnaSeqTutorial")
>> library(BSgenome.Dmelanogaster.UCSC.dm3)
>> 
>> count.table <- easyRNASeq(filesDirectory=system.file(
> +                             "extdata",
> +                             package="RnaSeqTutorial"),
> +                           pattern="[A,C,T,G]{6}\\.bam$",
> +                           format="bam",
> +                           readLength=36L,
> +                           organism="Dmelanogaster",
> +                           chr.sizes=as.list(seqlengths(Dmelanogaster)),
> +                           annotationMethod="rda",
> +                           annotationFile=system.file(
> +                             "data",
> +                             "gAnnot.rda",
> +                             package="RnaSeqTutorial"),
> +                           count="exons"
> +                           )
> Checking arguments... 
> Fetching annotations... 
> Summarizing counts... 
> Processing ACACTG.bam 
> Processing ACTAGC.bam 
> Processing ATGGCT.bam 
> Processing TTGCGA.bam 
> Preparing output 
> There were 11 warnings (use warnings() to see them)
>> 
>> 
>> warnings() 
> Warning messages:
> 1: In easyRNASeq(filesDirectory = system.file("extdata",  ... :
>  You enforce UCSC chromosome conventions, however the provided chromosome size list is not compliant. Correcting it.
> 2: In easyRNASeq(filesDirectory = system.file("extdata",  ... :
>  There are 50573 features/exons defined in your annotation that overlap! This implies that some reads will be counted more than once! Is that really what you want?
> 3: In easyRNASeq(filesDirectory = system.file("extdata",  ... :
>  You enforce UCSC chromosome conventions, however the provided annotation is not compliant. Correcting it.
> 4: In fetchCoverage(obj, format = format, filename = file,  ... :
>  You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it.
> 5: In fetchCoverage(obj, format = format, filename = file,  ... :
>  The read length stored in the object (probably provided as argument): 36
> is not the same as the one: 30 determined from the file: /Library/Frameworks/R.framework/Versions/2.15/Resources/library/RnaSeqTutorial/extdata/ACACTG.bam 
> Updating it.
> 6: In fetchCoverage(obj, format = format, filename = file,  ... :
>  You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it.
> 7: In fetchCoverage(obj, format = format, filename = file,  ... :
>  The read length stored in the object (probably provided as argument): 36
> is not the same as the one: 30 determined from the file: /Library/Frameworks/R.framework/Versions/2.15/Resources/library/RnaSeqTutorial/extdata/ACTAGC.bam 
> Updating it.
> 8: In fetchCoverage(obj, format = format, filename = file,  ... :
>  You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it.
> 9: In fetchCoverage(obj, format = format, filename = file,  ... :
>  The read length stored in the object (probably provided as argument): 36
> is not the same as the one: 30 determined from the file: /Library/Frameworks/R.framework/Versions/2.15/Resources/library/RnaSeqTutorial/extdata/ATGGCT.bam 
> Updating it.
> 10: In fetchCoverage(obj, format = format, filename = file,  ... :
>  You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it.
> 11: In fetchCoverage(obj, format = format, filename = file,  ... :
>  The read length stored in the object (probably provided as argument): 36
> is not the same as the one: 30 determined from the file: /Library/Frameworks/R.framework/Versions/2.15/Resources/library/RnaSeqTutorial/extdata/TTGCGA.bam 
> Updating it.
>> 
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
> [1] BSgenome.Dmelanogaster.UCSC.dm3_1.3.17 RnaSeqTutorial_0.0.9                  
> [3] easyRNASeq_1.2.3                       ShortRead_1.14.4                      
> [5] latticeExtra_0.6-19                    RColorBrewer_1.0-5                    
> [7] lattice_0.20-6                         Rsamtools_1.8.5                       
> [9] DESeq_1.8.3                            locfit_1.5-8                          
> [11] BSgenome_1.24.0                        GenomicRanges_1.8.7                   
> [13] Biostrings_2.24.1                      IRanges_1.14.4                        
> [15] edgeR_2.6.10                           limma_3.12.1                          
> [17] biomaRt_2.12.0                         Biobase_2.16.0                        
> [19] genomeIntervals_1.12.0                 BiocGenerics_0.2.0                    
> [21] intervals_0.13.3                      
> 
> loaded via a namespace (and not attached):
> [1] annotate_1.34.1      AnnotationDbi_1.18.1 bitops_1.0-4.1       DBI_0.2-5            genefilter_1.38.0   
> [6] geneplotter_1.34.0   grid_2.15.1          hwriter_1.3          RCurl_1.91-1         RSQLite_0.11.1      
> [11] splines_2.15.1       stats4_2.15.1        survival_2.36-14     XML_3.9-4            xtable_1.7-0        
> [16] zlibbioc_1.2.0      
>> 
> 
> ###################################################################################
> 
> I would appreciate any suggestions.
> 
> Thanks and best wishes,
> Rich
> Richard A. Friedman, PhD
> Associate Research Scientist,
> Biomedical Informatics Shared Resource
> Herbert Irving Comprehensive Cancer Center (HICCC)
> Lecturer,
> Department of Biomedical Informatics (DBMI)
> Educational Coordinator,
> Center for Computational Biology and Bioinformatics (C2B2)/
> National Center for Multiscale Analysis of Genomic Networks (MAGNet)
> Room 824
> Irving Cancer Research Center
> Columbia University
> 1130 St. Nicholas Ave
> New York, NY 10032
> (212)851-4765 (voice)
> friedman at cancercenter.columbia.edu
> http://cancercenter.columbia.edu/~friedman/
> 
> "School is an evil plot to suppress my individuality"
> 
> Rose Friedman, age15
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list