[BioC] error in report(qa) from pkg ShortRead

Timothée Flutre timothee.flutre at supagro.inra.fr
Fri Sep 12 14:40:11 CEST 2014


Thanks, I removed stringsAsFactors=TRUE from my ~/.Rprofile!

When I have several files, I encountered the following error:
> files <- dir("~/test", "*.fastq.gz$", full=TRUE) qas <-  
> qaSummary(files, type="fastq.gz")
Error: could not find function "qaSummary"

Even though it is present in the Overview vignette  
(http://www.bioconductor.org/packages/release/bioc/vignettes/ShortRead/inst/doc/Overview.pdf).
I guess that qaSummary() is in fact deprecated in favor of qa(), right?

Moreover, when fed with several large fastq files, qa() seems much  
slower than FastQC. Would it be possible to add a progress bar to  
qa()? For instance, via this function  
(http://stat.ethz.ch/R-manual/R-patched/library/utils/html/txtProgressBar.html) or this package (http://cran.r-project.org/web/packages/pbapply/)? I had a quick look at the ShortRead pkg source code, but couldn't find easily where to add  
this.

I also tried to get a sense of the time it takes to run a single file,  
but encountered the following error:
> system.time(qa <- qa(dirPath="~/test",  
> pattern="RPI2_S1_L001_R1_001.fastq.gz", type="fastq", sample=TRUE))
    user  system elapsed
  26.719   0.490  22.565
> system.time(qa <- qa(dirPath="~/test",  
> pattern="RPI2_S1_L001_R1_001.fastq.gz", type="fastq", sample=FALSE))
Error: 1 errors; first error:
   Error: UserArgumentMismatch
   'pattern' must be 'character(0) or character(1)'
For more information, use bplasterror(). To resume calculation, re-call
   the function and set the argument 'BPRESUME' to TRUE or wrap the
   previous call in bpresume().
First traceback:
   28: system.time(qa <- qa(dirPath = "~/test",
           pattern = "RPI2_S1_L001_R1_001.fastq.gz", type = "fastq",
           sample = FALSE))
   27: qa(dirPath = "~/test",
           pattern = "RPI2_S1_L001_R1_001.fastq.gz", type = "fastq",
           sample = FALSE)
   26: qa(dirPath = "~/test",
           pattern = "RPI2_S1_L001_R1_001.fastq.gz", type = "fastq",
           sample = FALSE)
   25: .local(dirPath, ...)
   24: .qa_fastq(dirPath, pattern, type = type, ...)
   23: bplapply(fls, .qa_fastq_lane, type = type, ..., verbose = verbose)
   22: bplapply(fls
Timing stopped at: 0.013 0 0.013
> bplasterror()
0 / 1 partial results stored. First 1 error messages:
[1]: Error: UserArgumentMismatch
   'pattern' must be 'character(0) or character(1)'

I don't understand why the same command works with sample=TRUE, but  
doesn't with sample=FALSE.

Timothée Flutre
Chargé de Recherche / Research Scientist
INRA - Centre de Montpellier
  http://umr-agap.cirad.fr/en
  http://openwetware.org/wiki/User:Timothee_Flutre"Martin Morgan"
<mtmorgan at fhcrc.org> a écrit :

> On 09/11/2014 08:38 AM, Timothée Flutre wrote:
>> Hello,
>>
>> I have a fastq file compressed with gzip in a directory named
test/.
>> I would like to assess its quality. Here is what I do:
>>
>> $ R
>>> library(ShortRead)
>>> qa <- qa("~/test", "fastq.gz")
>>> report(qa, dest="~/test")
>>
>> And I get the following error message:
>> Error in as.data.frame(lapply(df, sprintf, fmt = fmt)) :
>> error in evaluating the argument 'x' in selecting a method for
>> function 'as.data.frame': Error in FUN(X[[1L]], ...) :
>> invalid format '%.3g'; use format %s for character objects
>>
>> Here are more details:
>>> traceback()
>> 7: as.data.frame(lapply(df, sprintf, fmt = fmt))
>> 6: .df2a(qa[["adapterContamination"]])
>> 5: hwrite(.df2a(qa[["adapterContamination"]]), border = 0)
>
> I see you are in the half of the R users who dislike factors! I
think
>
>   qa[["adapterContamination"]]
>
> should be a data.frame with a single column 'contamination', and
that
> the single column should be a factor or numeric; I think you have
set
>
>   options(stringsAsFactors=FALSE)
>
> and so instead of a factor or numeric it is character.
>
> The workaround is to set options(stringsAsFactors=TRUE) (or not set

> this option at all!). This will be fixed in the next release of ShortRead.
>
> Thanks for the report, and sorry for the inconvenience.
>
> Martin
>
>> 4: func(x, dest, type, ...)
>> 3: func(x, dest, type, ...)
>> 2: report(qa, dest = "~/test")
>> 1: report(qa, dest = "~/test")
>>
>>> sessionInfo()
>> R version 3.1.0 (2014-04-10)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C [3]
LC_TIME=en_US.UTF-8
>>      LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8
>> LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[9]
>> LC_ADDRESS=C               LC_TELEPHONE=C [11]
>> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets
>> methods [8] base
>>
>> other attached packages:
>> [1] ShortRead_1.22.0        GenomicAlignments_1.0.2
BSgenome_1.32.0
>> [4] Rsamtools_1.16.1        GenomicRanges_1.16.3
GenomeInfoDb_1.0.2
>> [7] Biostrings_2.32.1       XVector_0.4.0           IRanges_1.22.9
>> [10] BiocParallel_0.6.1      BiocGenerics_0.10.0
>>
>> loaded via a namespace (and not attached):
>> [1] BatchJobs_1.3       BBmisc_1.7          Biobase_2.24.0 [4]
>> bitops_1.0-6        brew_1.0-6          checkmate_1.1 [7]
>> codetools_0.2-8     compiler_3.1.0      DBI_0.2-7 [10]
digest_0.6.4
>>     fail_1.2            foreach_1.4.2 [13] grid_3.1.0
>> hwriter_1.3.1       iterators_1.0.7 [16] lattice_0.20-29
>> latticeExtra_0.6-26 RColorBrewer_1.0-5 [19] RSQLite_0.11.4
>> sendmailR_1.1-2     stats4_3.1.0 [22] stringr_0.6.2
tools_3.1.0
>>        zlibbioc_1.10.0
>>
>> Timothée Flutre
>> Chargé de Recherche / Research Scientist
>> INRA - Centre de Montpellier
>>  http://umr-agap.cirad.fr/en
>>  http://openwetware.org/wiki/User:Timothee_Flutre
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> -- 
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
>



More information about the Bioconductor mailing list