[BioC] trimTails function in ShortRead package give different results on the same input

Martin Morgan mtmorgan at fhcrc.org
Wed Oct 17 23:21:57 CEST 2012


On 10/15/2012 5:34 AM, Zhenyu Xu wrote:
> Hi ShortRead package developer,
>
>      I tried to use the function trimTails to trim some bad quality bases from reads coming out of 454 sequencing machine.  However I got different results if I run the command several times starting from the same ShortReadQ object and same trimming parameter. This is observed in  centos linux machine (6.2 and 6.3).  I also tried this with my own mac machine, but the results are identical. So seems the problem only restrict to centos linux machine (Not sure other linux platform has this problem or not). the data sets(~11Mb) can be downloaded at http://dl.dropbox.com/u/68829208/454reads.rds.


Thank you for the bug report, data, and reproducible example. This has been 
fixed in ShortRead 1.16.1 and in the devel branch, and should be available via 
biocLite after about 10am Seattle time, tomorrow.

The problem was only with successive=TRUE.

Martin

> best,
> zhenyu
>
> Please see the following of the execution:
>
> wget http://dl.dropbox.com/u/68829208/454reads.rds
> R
>
> R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows"
> Copyright (C) 2012 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
>    Natural language support but running in an English locale
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
>> library(ShortRead)
> Loading required package: BiocGenerics
>
> Attaching package: ‘BiocGenerics’
>
> The following object(s) are masked from ‘package:stats’:
>
>      xtabs
>
> The following object(s) are masked from ‘package:base’:
>
>      anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find,
>      get, intersect, lapply, Map, mapply, mget, order, paste, pmax,
>      pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int,
>      rownames, sapply, setdiff, table, tapply, union, unique
>
> Loading required package: IRanges
> Loading required package: GenomicRanges
> Loading required package: Biostrings
> Loading required package: lattice
> Loading required package: Rsamtools
> Loading required package: latticeExtra
> Loading required package: RColorBrewer
>> readsSub <- readRDS("454reads.rds")
>> readsSub
> class: ShortReadQ
> length: 5460 reads; width: 5..424 cycles
>> trimTails(readsSub, 20, "5", successive=TRUE)
> class: ShortReadQ
> length: 5460 reads; width: 3..416 cycles
>> trimTails(readsSub, 20, "5", successive=TRUE)
> class: ShortReadQ
> length: 5460 reads; width: 3..416 cycles
>> trimTails(readsSub, 20, "5", successive=TRUE)
> class: ShortReadQ
> length: 5460 reads; width: 4..424 cycles
>> trimTails(readsSub, 20, "5", successive=TRUE)
> class: ShortReadQ
> length: 5460 reads; width: 5..416 cycles
>> trimTails(readsSub, 20, "5", successive=TRUE)
> class: ShortReadQ
> length: 5460 reads; width: 4..424 cycles
>> x = trimTails(readsSub, 20, "5", successive=TRUE)
>> y = trimTails(readsSub, 20, "5", successive=TRUE)
>> sum(width(x)!=width(y))
> [1] 1325
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>   [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] ShortRead_1.14.4    latticeExtra_0.6-19 RColorBrewer_1.0-5
> [4] Rsamtools_1.8.5     lattice_0.20-6      Biostrings_2.24.1
> [7] GenomicRanges_1.8.9 IRanges_1.14.4      BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.16.0 bitops_1.0-4.1 grid_2.15.1    hwriter_1.3    stats4_2.15.1
> [6] zlibbioc_1.2.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Dr. Martin Morgan, PhD
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109



More information about the Bioconductor mailing list