[BioC] ShortRead qa() error: Error in density.default(qscore) : 'x' contains missing values

Martin Morgan mtmorgan at fhcrc.org
Wed Nov 20 05:34:54 CET 2013


On 11/19/2013 08:11 PM, Sam McInturf wrote:
> Hello everyone,
>     I am using ShortRead to do some pre-processing for 100bp Illumina reads.
>   I used qa() to read to do an upfront analysis, and then used trimEnds() to

I'd guess that trimEnds actually trims the ends of some reads to zero length; 
these are retained by default, whereas probably you'd like to drop them. I'll 
work on a bug fix into the devel branch, but a workaround might be to...

> trim my reads by quality, and now I would like to redo the qa() to see a
> 'before and after' type of thing.  After running it I get this error:
>
>> qas <- lapply(seq_along(fls), function(i, fls) qa(readFastq(fls[i]),
> names(fls)[i]), fls)

only call qa on reads with width != 0, rfq = readFastq(fls[i]); 
qa(rfq[width(rfq) != 0])

If this is for 10's of millions of reads a better strategy might be to run qa on 
the files directly (this might not help in this particular situation)

   qa(directory_for_fastq, pattern_defining_files, type="fastq")

or rfq = yield(FastqSampler(fls[i])); qa(rfq[width(rfq) != 0])

which will draw a random sample rather than read all reads into memory.

Martin

> Error in density.default(qscore) : 'x' contains missing values
> Calls: lapply ... .qa_qdensity -> density -> density -> density.default
> Execution halted
>
> I ran the first 100 reads of one of the files and it worked just fine.
>
> My scripts look like this:
> To trim the reads
> the first few lines are just parsing to get a nice looking name for the
> 'newfls'
>
> oldfls <-
> c("/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_L005_R1_001.fastq",
> "/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_L006_R1_001.fastq",
> "/data/smcintur/RNASeq/NMseq/Sample_CRT1/CRT1_ATTCCT_L007_R1_001.fastq")
> flsSplit <- strsplit(oldfls, "/")
> FileSplit <- unlist(lapply(flsSplit, function(x) paste(x[1:6], collapse =
> "/")))
> readFiles <- unlist(lapply(flsSplit, function(x) x[7]))
> newfls <-
> paste("/scratch/smcintur/Sample_CRT1/",unlist(lapply(strsplit(readFiles,
> "_"),
>                                   function(x) paste(x[1], "Clean",  x[3],
> sep = "_"))),sep ="")
> newfls <- paste(newfls, ".fastq", sep = "")
>
> reads <- readFastq(oldfls[1])
> treads <- trimEnds(reads, Cutoff)
> rm(reads)
> writeFastq(treads, file = newfls[1])
> rm(treads)
>
> and then the same thing for the following two reads.
>
>
> And then to check quality
> again, parsing up front for pretty names, and then a qa() call.
> Note that this script worked just fine to process the reads before they were
>
> library(ShortRead)
> fls <- list.files(pattern = ".fastq")
> flsSplit <- strsplit(fls, split = "_")
> Lib <- lapply(flsSplit, function(x) x[1])
> Lane <- lapply(flsSplit, function(x) x[3])
> names(fls) <- paste(Lib, Lane, sep = "_")
> names(fls)  <- gsub(".fastq", "", names(fls))
> outname <- paste("QA_", Lib[1], ".rda", sep = "")
>
> qas <- lapply(seq_along(fls), function(i, fls) qa(readFastq(fls[i]),
> names(fls)[i]), fls)
> QA <- do.call(rbind, qas)
> save(QA, file = outname)
>
> Does anyone have an idea of what is going on?
>
>
>
>
>> sessionInfo()
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] ShortRead_1.18.0     latticeExtra_0.6-24  RColorBrewer_1.0-5
> [4] Rsamtools_1.12.3     lattice_0.20-15      Biostrings_2.28.0
> [7] GenomicRanges_1.12.4 IRanges_1.18.1       BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.20.0 bitops_1.0-5   grid_3.0.0     hwriter_1.3
>   stats4_3.0.0
> [6] zlibbioc_1.6.0
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list