[BioC] question about forgeBSgenomeDataPkg function

Hervé Pagès hpages at fhcrc.org
Mon Mar 8 22:50:03 CET 2010


Hi Brian,

I'm putting this on the mailing list since this might actually affect
other users.

Brian Herb wrote:
> Herve-
>  
> You perviously helped me with building the BSgenome package for the Rat, 
> and now i am helping my lab mate create a BSgenome package for the 
> rhesus monkey. We are running into an error when he reads in the gap files:
>  
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, 
> na.strings,  :
>   scan() expected 'an integer', got 'fragment'
>  
> we wonder if the issue is that the gap files are in a slightly different 
> format than what I am used to with the rat:
>  
> Example Rat gap file:
>  
> 585 chr1 1360 2576 2 N 1216 fragment yes
> 585 chr1 5378 5428 4 N 50 fragment yes
> 585 chr1 13845 13895 6 N 50 fragment yes
> 585 chr1 23435 23485 8 N 50 fragment yes
> 585 chr1 25955 26005 10 N 50 fragment yes
> 585 chr1 33306 33356 12 N 50 fragment yes
> 585 chr1 35384 40627 14 N 5243 fragment yes
> 585 chr1 45904 46169 16 N 265 fragment yes
>  
> Example rhesus monkey gap file:
>  
>  
> chr1 17248 17350 2 N 102 fragment yes
> chr1 26206 26619 4 N 413 fragment yes
> chr1 27937 28130 6 N 193 fragment yes
> chr1 47170 48593 8 N 1423 fragment yes
> chr1 83907 85189 10 N 1282 fragment yes
> chr1 95455 96505 12 N 1050 fragment yes
> chr1 100303 100323 14 N 20 fragment yes
> chr1 132263 132283 16 N 20 fragment yes
> chr1 151325 152178 18 N 853 fragment yes

Good catch! It seems that we can't indeed assume that UCSC is using a
consistent schema for their 'gap' table. For Rat and any other organisms
I've seen to far, the columns are the following:

http://genome.ucsc.edu/cgi-bin/hgTables?db=rn4&hgta_group=map&hgta_track=gap&hgta_table=gap&hgta_doSchema=describe+table+schema

but for Rhesus, the 'bin' column is missing:

http://genome.ucsc.edu/cgi-bin/hgTables?db=rheMac2&hgta_group=map&hgta_track=gap&hgta_table=gap&hgta_doSchema=describe+table+schema

I've tried to accommodate this in read.gapMask(). This change will be
available in IRanges >= 1.5.56 (devel) and IRanges >= 1.4.13 (release).
Both packages should become available thru biocLite() in the next 24
hours. Please let me know if you encounter any further issue.

Thanks for the report!
H.

>  
> we wonder if the missing column in the monkey gap file is throwing off 
> the forgeMasksFiles function, and if there is something that we can 
> stipulate in this function to change which column it is looking for.
>  
> 
>  > sessionInfo()
> R version 2.10.0 (2009-10-26)
> x86_64-unknown-linux-gnu
> locale:
>  [1] LC_CTYPE=en_US.iso885915       LC_NUMERIC=C
>  [3] LC_TIME=en_US.iso885915        LC_COLLATE=en_US.iso885915
>  [5] LC_MONETARY=C                  LC_MESSAGES=en_US.iso885915
>  [7] LC_PAPER=en_US.iso885915       LC_NAME=C
>  [9] LC_ADDRESS=C                   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> other attached packages:
> [1] BSgenome_1.14.2    Biostrings_2.14.10 IRanges_1.4.9
> loaded via a namespace (and not attached):
> [1] Biobase_2.6.0
>  
> Kind Regards,
> Brian
> 
> 
> -- 
> Brian Herb
> Graduate Program in Biochemistry, Cellular and Molecular Biology
> Johns Hopkins School of Medicine
> Dr. Andrew Feinberg Laboratory
> Rangos 580
> 855 N. Wolfe St.
> Baltimore, MD 21205
> Phone:410-614-3479
> Fax: 410-614-9819

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list