[BioC] (BSgenome) forgeBSgenomeDataPkg for Sus scrofa problem

Hervé Pagès hpages at fhcrc.org
Mon Oct 10 21:43:31 CEST 2011


Hi Elisabetta,

Handling of missing nmask_per_seq field was broken (should have been
set to 0 when missing). I just fixed this in BSgenome release (1.20.1)
and devel (1.21.7). Anyway, in your case, it seems like you *do* have
masks, so you need to have the nmask_per_seq field explicitly set
to a non-zero value in your seed file. For example, if you have the 4
"standard" masks:

nmask_per_seq: 4

You can look at the seed file for hg19 in the BSgenome package
(BSgenome/inst/extdata/GentlemanLab/BSgenome.Hsapiens.UCSC.hg19-seed)
for an example.

Please let me know if you have further questions about this.

Cheers,
H.


On 11-10-07 11:21 AM, Elisabetta Manduchi wrote:
>
> Hello,
> I'm trying to build a data package for Sus scrofa with BSgenome (R
> version 2.13.2 and BSgenome version 1.20.0).
> At the bottom of this email I've copied my seed file.
> I've downloaded the sequence files from UCSC and checked the md5sums.
> I've also downloaded the gap.txt and masks files (chr*.fa.out and
> chr*.bed) from UCSC (but no md5sums were provided).
> I've followed the instructions from
> http://bioconductor.org/packages/2.8/bioc/vignettes/BSgenome/inst/doc/BSgenomeForge.pdf
>
> and I'm getting the following error
>
> ---
>> forgeBSgenomeDataPkg("./BSgenome.Sscrofa.UCSC.susScr2-seed")
> Error in forgeBSgenomeDataPkg(y, seqs_srcdir = seqs_srcdir, masks_srcdir
> = masks_srcdir, :
> values for symbols NMASKPERSEQ are not single strings
> ---
>
> Can you advice on what the problem might be?
> Thanks,
> Elisabetta
>
>
> *SEED file BSgenome.Sscrofa.UCSC.susScr2-seed*
>
> Package: BSgenome.Sscrofa.UCSC.susScr2
> Title: Sus scrofa (Pig) full genome (UCSC version susScr2)
> Description: Sus scrofa (Pig) full genome as provided by UCSC (susScr2,
> Nov. 2009)
> Version: 0.1-0
> Author: Elisabetta Manduchi <manduchi at pcbi.upenn.edu>
> Maintainer: Elisabetta Manduchi <manduchi at pcbi.upenn.edu>
> License: GPL-3
> organism: Sus scrofa
> species: Pig
> provider: UCSC
> provider_version: susScr2
> release_date: Nov. 2009
> release_name: SGSC Sscrofa9.2
> source_url: http://hgdownload.cse.ucsc.edu/goldenPath/susScr2/
> organism_biocview: Sus_scrofa
> BSgenomeObjname: Sscrofa
> seqnames: paste("chr", c(1:18, "X", "M"), sep="")
> circ_seqs: "chrM"
> SrcDataFiles1: sequences: all the chr*.fa.gz files from
> ftp://hgdownload.cse.ucsc.edu/goldenPath/susScr2/chromosomes/
> SrcDataFiles2: AGAPS masks: the gap.txt.gz file from
> http://hgdownload.cse.ucsc.edu/golden
> Path/susScr2/database/; RM masks:
> http://hgdownload.cse.ucsc.edu/goldenPath/susScr2/bigZip
> s/chromOut.tar.gz;TRF masks:
> http://hgdownload.cse.ucsc.edu/goldenPath/susScr2/bigZips/chr
> omTrf.tar.gz
> seqs_srcdir:
> /mnt/files/cbil/data/cbil/UHTS/Davies/AAvsDT_DNAmethyl/working_dir/MEDIPS/BSgenome.Sscrofa.UCSC.susScr2/seqs
>
> masks_srcdir:
> /mnt/files/cbil/data/cbil/UHTS/Davies/AAvsDT_DNAmethyl/working_dir/MEDIPS/BS
>
> genome.Sscrofa.UCSC.susScr2/masks
> AGAPSfiles_type: gap
> AGAPSfiles_name: gap.txt
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list