[BioC] QuasR: how to use an indexed reference genome?

Paul Shannon pshannon at fhcrc.org
Thu May 16 22:48:47 CEST 2013


I am new to QuasR, and alos quite new to aligning short reads to reference genomes more generally.
I cannot figure out how to use a pre-built indexed reference genome file with QuasR.   The examples supplied with the package work nicely.
Scaling up to using all of hg19 raises problems for me.  I apologize if I am missing the obvious.

To illustrate the problem, I call QuasR's qAlign method with just two arguments (quoting from the man page):

    sampleFile:  a text file listing input sequence files and  sample names

    genome: the reference genome for primary alignments, one of:

            * a string referring to a "BSgenome" package (e.g.
              ""BSgenome.Hsapiens.UCSC.hg19""), which will be
              downloaded automatically from Bioconductor if not present

            * the name of a fasta sequence file containing one or
              several sequences (chromosomes) to be used as a reference

QuasR apparently invokes the bowtie indexing program when supplied either of the two "genome" options:  a BSgenome package, or a fasta file.  But since indexing takes a long time -- hours, apparently --  I hoped to provide a ready-made index file, and found some described here:


   http://bowtie-bio.sourceforge.net/tutorial.shtml

specifically 

   ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/hg19.ebwt.zip

Various attempts to specify this file, or any of its contents (unzipped) to QuasR fail with these messages:


   Error: The specified genome /Users/pshannon/s/data/public/bowtie/indexes/hg19.1.ebwt does not have the extension of a fasta file (fa,fasta,fna)> 
   Error: The specified genome has to be a file and not a directory: /Users/pshannon/s/data/public/bowtie/indexes


I'll be grateful for advice on how to do this properly.

Thanks,

 - Paul


> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Rsamtools_1.13.14     BSgenome_1.29.0       Biostrings_2.29.2     QuasR_1.1.4           GenomicRanges_1.13.12 XVector_0.1.0        
 [7] IRanges_1.19.8        BiocGenerics_0.7.2    Rbowtie_1.1.3         BiocInstaller_1.11.1 

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.23.11  Biobase_2.21.2         DBI_0.2-7              GenomicFeatures_1.13.8 RCurl_1.95-4.1        
 [6] RSQLite_0.11.3         ShortRead_1.19.3       XML_3.95-0.2           biomaRt_2.17.0         bitops_1.0-5          
[11] compiler_3.0.0         grid_3.0.0             hwriter_1.3            lattice_0.20-15        rtracklayer_1.21.5    
[16] stats4_3.0.0           tools_3.0.0            zlibbioc_1.7.0        


More information about the Bioconductor mailing list