[BioC] QuasR: problem accessing BSgenome.Rnorvegicus.UCSC.rn5

Hans-Rudolf Hotz hrh at fmi.ch
Wed Mar 12 13:39:38 CET 2014


Hi Guido

Michael is on holiday this week. In the meantime, I will try to reply to 
my best knowledge - I am sure, Michael will reply with a better answer, 
once he is back.

We hardly work with rat, so I just downloaded the corresponding BSgenome 
and TxD packages (from devel). And indeed there is a mismatch wrt the 
seqlevels of the BSgenome and TxDb files:

 > seqlevels(Rnorvegicus)
  [1] "chr1"  "chr2"  "chr3"  "chr4"  "chr5"  "chr6"  "chr7"  "chr8" 
"chr9"
[10] "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18"
[19] "chr19" "chr20" "chrX"  "chrM"
 > seqlevels(TxDb.Rnorvegicus.UCSC.rn5.refGene)[1:30]
  [1] "chr1"                     "chr2"
  [3] "chr3"                     "chr4"
  [5] "chr5"                     "chr6"
  [7] "chr7"                     "chr8"
  [9] "chr9"                     "chr10"
[11] "chr11"                    "chr12"
[13] "chr13"                    "chr14"
[15] "chr15"                    "chr16"
[17] "chr17"                    "chr18"
[19] "chr19"                    "chr20"
[21] "chrX"                     "chrM"
[23] "chr1_AABR06109291_random" "chr1_AABR06109292_random"
[25] "chr1_AABR06109293_random" "chr1_AABR06109294_random"
[27] "chr1_AABR06109295_random" "chr1_AABR06109296_random"
[29] "chr1_AABR06109297_random" "chr1_AABR06109298_random"
 >

As a quick fix, I recommend to restrict the seq levels of the TxDb, eg:


seqlevels(TxDb.Rnorvegicus.UCSC.rn5.refGene, force=TRUE) <- 
seqlevels(Rnorvegicus)


Hope this helps,
Regards, Hans-Rudolf



 > sessionInfo()
R Under development (unstable) (2014-01-17 r64817)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
  [1] TxDb.Rnorvegicus.UCSC.rn5.refGene_2.10.1
  [2] GenomicFeatures_1.15.9
  [3] AnnotationDbi_1.25.14
  [4] Biobase_2.23.6
  [5] BSgenome.Rnorvegicus.UCSC.rn5_1.3.99
  [6] BSgenome_1.31.12
  [7] Biostrings_2.31.14
  [8] XVector_0.3.7
  [9] GenomicRanges_1.15.38
[10] GenomeInfoDb_0.99.19
[11] IRanges_1.21.34
[12] BiocGenerics_0.9.3
[13] BiocInstaller_1.13.3

loaded via a namespace (and not attached):
  [1] BatchJobs_1.2             BBmisc_1.5
  [3] BiocParallel_0.5.17       biomaRt_2.19.3
  [5] bitops_1.0-6              brew_1.0-6
  [7] codetools_0.2-8           DBI_0.2-7
  [9] digest_0.6.4              fail_1.2
[11] foreach_1.4.1             GenomicAlignments_0.99.32
[13] iterators_1.0.6           plyr_1.8.1
[15] Rcpp_0.11.0               RCurl_1.95-4.1
[17] Rsamtools_1.15.33         RSQLite_0.11.4
[19] rtracklayer_1.23.16       sendmailR_1.1-2
[21] stats4_3.1.0              stringr_0.6.2
[23] tools_3.1.0               XML_3.98-1.1
[25] zlibbioc_1.9.0
 >




On 03/12/2014 12:20 PM, Hooiveld, Guido wrote:
> Dear Michael,
> Sorry to bother you with this, but I face a problem using QuasR which I can't solve:
> I would like to summarize my reads into a count table, but I got stuck... An error is thrown that some queries cannot be found.
> I generated my project essentially as described in the vignette using the unmasked BS.genome file in R-dev, and then would like to annotate it using BioC's rat TxDb. Could this be due to a mismatch between the content of the BSgenome and TxDb files? (the information content of the former is dated later than the latter in R-dev)?
> Any suggestion would be appreciated!
>
> Thanks,
> Guido
>
> sampleFile <- "samples_GH2.txt"
> genomeFile <- "BSgenome.Rnorvegicus.UCSC.rn5"
> proj2 <- qAlign(sampleFile=sampleFile, genome=genomeFile)
>
>
>> geneLevels <- qCount(proj2, TxDb.Rnorvegicus.UCSC.rn5.refGene,reportLevel="gene")
> Error in qCount(proj2, TxDb.Rnorvegicus.UCSC.rn5.refGene, reportLevel = "gene") :
>    sequence levels in 'query' not found in alignment files: chr1_AABR06109291_random, chr1_AABR06109292_random, chr1_AABR06109293_random, chr1_AABR06109294_random, chr1_AABR06109295_random, chr1_AABR06109296_random, chr1_AABR06109297_random, chr1_AABR06109298_random, chr1_AABR06109299_random, chr1_AABR06109300_random, chr1_AABR06109301_random, chr1_AABR06109302_random, chr1_AABR06109303_random, chr1_AABR06109307_random, chr1_AABR06109308_random, chr1_AABR06109309_random, chr1_AABR06109310_random, chr1_AABR06109311_random, chr1_AABR06109312_random, chr1_AABR06109313_random, chr1_AABR06109314_random, chr1_AABR06109315_random, chr1_AABR06109316_random, chr1_AABR06109317_random, chr1_AABR06109322_random, chr1_AABR06109323_random, chr1_AABR06109324_random, chr1_AABR06109325_random, chr1_AABR06109331_random, chr1_AABR06109332_random, chr1_AABR06109333_random, chr1_AABR06109334_random, chr1_AABR06109335_random, chr1_AABR06109336_random, chr1_AABR06109337_random, chr1_AABR06109340_
 rando
>>
>
>> sessionInfo()
> R Under development (unstable) (2013-11-19 r64265)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] TxDb.Rnorvegicus.UCSC.rn5.refGene_2.10.1 BiocInstaller_1.13.3
>   [3] QuasR_1.3.13                             Rbowtie_1.3.1
>   [5] rtracklayer_1.23.15                      GenomicFeatures_1.15.9
>   [7] AnnotationDbi_1.25.14                    Biobase_2.23.6
>   [9] GenomicRanges_1.15.38                    GenomeInfoDb_0.99.19
> [11] IRanges_1.21.34                          BiocGenerics_0.9.3
>
> loaded via a namespace (and not attached):
>   [1] BatchJobs_1.2             BBmisc_1.5                BiocParallel_0.5.8
>   [4] biomaRt_2.19.3            Biostrings_2.31.14        bitops_1.0-6
>   [7] brew_1.0-6                BSgenome_1.31.12          codetools_0.2-8
> [10] DBI_0.2-7                 digest_0.6.4              fail_1.2
> [13] foreach_1.4.1             GenomicAlignments_0.99.32 grid_3.1.0
> [16] hwriter_1.3               iterators_1.0.6           lattice_0.20-24
> [19] latticeExtra_0.6-26       plyr_1.8.1                RColorBrewer_1.0-5
> [22] Rcpp_0.11.0               RCurl_1.95-4.1            Rsamtools_1.15.33
> [25] RSQLite_0.11.4            sendmailR_1.1-2           ShortRead_1.21.16
> [28] stats4_3.1.0              stringr_0.6.2             tools_3.1.0
> [31] XML_3.98-1.1              XVector_0.3.7             zlibbioc_1.9.0
>>
>
> -----Original Message-----
> From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Michael Stadler
> Sent: Tuesday, March 04, 2014 13:20
> To: bioconductor at r-project.org
> Subject: Re: [BioC] QuasR: problem accessing BSgenome.Rnorvegicus.UCSC.rn5
>
> Hi Guido and Herve,
>
> You were both spot on. In the development version 1.3.9 of QuasR, we adapted to the new (BioC 2.14) type of BSgenome packages, so QuasR >=
> 1.3.9 only works with these.
>
> One clarification regarding the treatment of masks in QuasR:
>
> - QuasR <= 1.2.x has ignored masks in BSgenome packages
>    during alignment
>
> - QuasR >= 1.3.9 now handles BSgenome objects with or without masks,
>    so that the following the statement:
>
>      qAlign(..., genome="BSgenome.Rnorvegicus.UCSC.rn5")
>
>    is equivalent to the old behaviour (no masking), but the statement:
>
>      qAlign(..., genome="BSgenome.Rnorvegicus.UCSC.rn5.masked")
>
>    now aligns against a masked genome.
>
> I hope this helps.
>
> Cheers,
> Michael
>
>
>
> On 03.03.2014 14:26, Hooiveld, Guido wrote:
>> Hi Herve,
>> Good point.
>> I checked and version 1.3.17 was installed because that (still) is the latest (binary) version of the package available for Windows. I meanwhile re-installed the BSgenome package from source, and now QuasR is working on my Win7 machine as it should be (thus with v1.3.99). Based on your comments I am currently using the masked file, because that is the equivalent of the old file.
>>
>> Thanks again,
>> Guido
>>
>>
>> -----Original Message-----
>> From: Hervé Pagès [mailto:hpages at fhcrc.org]
>> Sent: Sunday, March 02, 2014 02:57
>> To: Hooiveld, Guido; bioconductor at r-project.org
>> Subject: Re: [BioC] QuasR: problem accessing
>> BSgenome.Rnorvegicus.UCSC.rn5
>>
>> Hi Guido,
>>
>> When using BioC devel, things can move fast so it's important that you update your packages often (with biocLite()) in order to keep everything in sync. In your case it looks like the version of the BSgenome package you have (1.3.17) is lagging behind the version currently in BioC devel (1.3.99).
>>
>> Note that starting with BioC 2.14 (which will be released in April,
>> but corresponds to BioC devel at the moment), many BSgenome packages
>> exist in 2 flavors: raw genome or masked genome. For example, for rn5,
>> there is now
>>
>>     BSgenome.Rnorvegicus.UCSC.rn5              raw genome
>>     BSgenome.Rnorvegicus.UCSC.rn5.masked       masked genome
>>
>> BSgenome.Rnorvegicus.UCSC.rn5.masked is equivalent to the old
>> BSgenome.Rnorvegicus.UCSC.rn5 in BioC <= 2.13 which was already masked. However, in BioC <= 2.13, there was no non-masked version of rn5. See announcement here for more details:
>>
>>     https://stat.ethz.ch/pipermail/bioc-devel/2014-January/005150.html
>>
>> I don't know if QuasR cares about the masks though. Maybe they're just ignored, in which case I guess you could just stick to BSgenome.Rnorvegicus.UCSC.rn5.
>>
>> Cheers,
>> H.
>>
>>
>> On 02/28/2014 03:44 PM, Hooiveld, Guido wrote:
>>> Hello,
>>> I am using R-dev, and would like to run QuasR to align a RNA-seq experiment.
>>> Unfortunately, I can't get past the indexing step because somehow BSgenome cannot be accessed by QuasR.
>>> I think this is due because it can be accessed by using "Rnorvegicus" rather than by (the expected)  "BSgenome.Rnorvegicus.UCSC.rn5".
>>>
>>> Is this to be changed in QuasR, or the BSgenome?
>>>
>>> Thanks,
>>> Guido
>>>
>>>
>>>> library(QuasR)
>>>> library(BSgenome)
>>>> library(Rsamtools)
>>>> library(rtracklayer)
>>>> library(GenomicFeatures)
>>>> library(BSgenome.Rnorvegicus.UCSC.rn5)
>>>> sampleFile <- "samples_GH2.txt"
>>>> genomeFile <- "BSgenome.Rnorvegicus.UCSC.rn5"
>>>>
>>>> proj <- qAlign(sampleFile=sampleFile, genome=genomeFile)
>>> alignment files missing - need to:
>>>       create alignment index for the genome
>>>       create 18 genomic alignment(s)
>>> will start in ..9s..8s..7s..6s..5s..4s..3s..2s..1s
>>> Error in get(genome) : object 'BSgenome.Rnorvegicus.UCSC.rn5' not
>>> found
>>>>
>>>
>>> # The info is there, so this does work, but it cannot be effectuated
>>> in QuasR
>>>> Rnorvegicus
>>> Rat genome
>>> |
>>> | organism: Rattus norvegicus (Rat)
>>> | provider: UCSC
>>> | provider version: rn5
>>> | release date: Mar. 2012
>>> | release name: RGSC 5.0
>>> |
>>> | single sequences (see '?seqnames'):
>>> |   chr1   chr2   chr3   chr4   chr5   chr6   chr7   chr8   chr9   chr10  chr11
>>> |   chr12  chr13  chr14  chr15  chr16  chr17  chr18  chr19  chr20  chrX   chrM
>>> |
>>> | multiple sequences (see '?mseqnames'):
>>> |   random        chrUn         upstream1000  upstream2000  upstream5000
>>> |
>>> | (use the '$' or '[[' operator to access a given sequence)
>>>> seqlengths(Rnorvegicus)
>>>        chr1      chr2      chr3      chr4      chr5      chr6      chr7      chr8
>>> 290094216 285068071 183740530 248343840 177180328 156897508 143501887 132457389
>>>        chr9     chr10     chr11     chr12     chr13     chr14     chr15     chr16
>>> 121549591 112200500  93518069  54450796 118718031 115151701 114627140  90051983
>>>       chr17     chr18     chr19     chr20      chrX      chrM
>>>    92503511  87229863  72914587  57791882 154597545     16313
>>>>
>>>
>>>> genomeFile <- "Rnorvegicus"
>>>> proj <- qAlign(sampleFile=sampleFile, genome=genomeFile)
>>> The specified genome is not a fasta file or an installed BSgenome.
>>> Connecting to Bioconductor and searching for a matching genome
>>> (internet connection required)...OK Bioconductor version 2.14
>>> (BiocInstaller 1.13.3), ?biocLite for help
>>> Error: Rnorvegicus is not available in Bioconductor. Type
>>> available.genomes() for a complete list
>>>>
>>>
>>>> sessionInfo()
>>> R Under development (unstable) (2013-11-19 r64265)
>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United
>>> States.1252 [3] LC_MONETARY=English_United States.1252 [4]
>>> LC_NUMERIC=C [5] LC_TIME=English_United States.1252
>>>
>>> attached base packages:
>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] BiocInstaller_1.13.3                 BSgenome.Rnorvegicus.UCSC.rn5_1.3.17
>>> [3] GenomicFeatures_1.15.7               AnnotationDbi_1.25.9
>>>    [5] Biobase_2.23.5                       rtracklayer_1.23.14
>>>    [7] Rsamtools_1.15.29                    BSgenome_1.31.12
>>>    [9] Biostrings_2.31.14                   XVector_0.3.7
>>> [11] QuasR_1.3.12                         Rbowtie_1.3.0
>>> [13] GenomicRanges_1.15.31                IRanges_1.21.32
>>> [15] BiocGenerics_0.9.3
>>>
>>> loaded via a namespace (and not attached):
>>> [1] BatchJobs_1.2             BBmisc_1.5
>>>    [3] BiocParallel_0.5.8        biomaRt_2.19.3
>>>    [5] bitops_1.0-6              brew_1.0-6
>>>    [7] codetools_0.2-8           DBI_0.2-7
>>>    [9] digest_0.6.4              fail_1.2
>>> [11] foreach_1.4.1             GenomicAlignments_0.99.26
>>> [13] grid_3.1.0                hwriter_1.3
>>> [15] iterators_1.0.6           lattice_0.20-24
>>> [17] latticeExtra_0.6-26       plyr_1.8.1
>>> [19] RColorBrewer_1.0-5        Rcpp_0.11.0
>>> [21] RCurl_1.95-4.1            RSQLite_0.11.4
>>> [23] sendmailR_1.1-2           ShortRead_1.21.14
>>> [25] stats4_3.1.0              stringr_0.6.2
>>> [27] tools_3.1.0               XML_3.98-1.1
>>> [29] zlibbioc_1.9.0
>>>>
>>>
>>>
>>> 	[[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> --
> --------------------------------------------
> Michael Stadler, PhD
> Head of Computational Biology
> Friedrich Miescher Institute
> Basel (Switzerland)
> Phone : +41 61 697 6492
> Fax   : +41 61 697 3976
> Mail  : michael.stadler at fmi.ch
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list