[BioC] SNPcounts error in SNPlocs.Hsapiens.dbSNP.20110815_0.99.3 (i.e., my attempt to update SNPlocs to dbSNP build 134)

Hervé Pagès hpages at fhcrc.org
Fri Sep 16 02:54:18 CEST 2011


Hi Tim,

Hard to know what was going on with your home made
SNPlocs.Hsapiens.dbSNP.20101109 (Build 134). Anyway, the
one I made on my side is now available for R-2.14 / BioC 2.9
users (source package only for now). I built it with R CMD build
--resave-data and this reduced the size of the tarball from 242M
to 166M without any significant impact on the time it takes to
load the compressed datasets :-)

32 millions SNPs in dbSNP Build 134 vs 23533016 in Build 133:

 > sum(SNPlocs.Hsapiens.dbSNP.20110815::getSNPcount())
[1] 32393176

 > sum(SNPlocs.Hsapiens.dbSNP.20101109::getSNPcount())
[1] 23533016

Longest run of consecutive SNPs on chr1: 102 (same as with Build
133). See ?injectSNPs for one way to find this run.

Let me know if you find problems with it.

Yes as Vince said: dbSNP is using ch1, ch2, ... chMT and
UCSC chr1, chr2, ... chrM. We use whatever they use. Glad you
found about seqlevels(x) <- new_names which is indeed the
standard way to rename the sequences of a GRanges/GRangesList
/GappedAlignments objects.

Cheers,
H.


On 11-09-15 01:24 PM, Tim Triche, Jr. wrote:
> it looks like the problem resolved itself... not precisely the way I was
> thinking, but with the same effect to me personally.
>
> the build 134 package will apparently hit the mirrors presently (Herve
> was building it unbeknownst to me) and the process of updating &
> rebuilding a substantial package was educational for me as well.  The
> issue seems to be one of improved/ changed documentation and internal
> regression testing since the last build, which is a Good Thing from my
> perspective.
>
> thanks all!
>
> --t
>
>
>
> On Thu, Sep 15, 2011 at 1:14 PM, Vincent Carey
> <stvjc at channing.harvard.edu <mailto:stvjc at channing.harvard.edu>> wrote:
>
>     I think you'd better have just one SNPlocs.* package in your
>     searchlist for the diagnosis to be as simple as possible.  But  I
>     will stop chattering as HP will surely know the answer.
>
>     On Thu, Sep 15, 2011 at 3:53 PM, Tim Triche, Jr. <ttriche at usc.edu
>     <mailto:ttriche at usc.edu>> wrote:
>
>         Sure, and since you might know the answer...
>
>         How come the standard policy for SNPlocs package is to use
>         (e.g.) $ch1, but for BSgenomes, it's $chr1?
>
>         I've been computing overlaps this morning and keep having to
>         change seqlevels() to do it without warnings being raised.
>
>         thanks,
>
>         --t
>
>
>
>         On Thu, Sep 15, 2011 at 12:20 PM, Vincent Carey
>         <stvjc at channing.harvard.edu <mailto:stvjc at channing.harvard.edu>>
>         wrote:
>
>             Can you give the sessionInfo() where this occurred?  Also
>             I'd recommend triggering the error with
>             options(error=recover) on so the stack can be seen.
>
>             On Thu, Sep 15, 2011 at 3:15 PM, Tim Triche, Jr.
>             <tim.triche at gmail.com <mailto:tim.triche at gmail.com>> wrote:
>
>                 Hi Herve (and others),
>
>                 in an attempt to update annotations for some probes, and
>                 after taking your
>                 workshop at Bioc2011, I have been playing with
>                 Biostrings, BSgenomes,
>                 GRanges, et al, mostly to good effect (thank you and
>                 others for writing and
>                 documenting them, they're terrific once the learning
>                 curve flattens out).
>                   However, I rebuilt the SNPlocs package following
>                 instructions you long ago
>                 posted for Paul Shannon, and I think I have hit a wall
>                 in that respect.  I
>                 put the offending package (and some others) up:
>
>                 http://flaver.com/annotations/  (in case you need the
>                 actual package for an
>                 autopsy)
>
>                 When I load and inject SNPlocs.Hsapiens.dbSNP.20101109,
>                 your package of
>                 build 132, everything goes as planned: I can inject the
>                 hard mask with
>                 polymorphisms where they're supposed to be, and
>                 everything is hunkey dorey.
>                   When I do this after building (without apparent
>                 incident) the b134 based
>                 SNPlocs.Hsapiens.dbSNP.20110815_0.99.3 , after editing
>                 the scripts to look
>                 for 'GRCh37.p2' instead of 'GRCh37' and rebuilding, I
>                 get the following
>                 error:
>
>                 Error in x %in% SNPlocs_seqnames :
>                   error in evaluating the argument 'table' in selecting
>                 a method for
>                 function '%in%': Error in getSNPcount() : internal
>                 error: 'SNPcount' data
>                 set is broken.
>                        Please contact the maintainer of the
>                 SNPlocs.Hsapiens.dbSNP.20110815
>                 package.
>
>                 This seems odd, since the internal regression tests all
>                 passed when I built
>                 the package.  But, obviously I must have missed
>                 something somewhere!
>
>                 Any thoughts?
>
>                 Thanks!
>
>
>
>                 --
>                 If people do not believe that mathematics is simple, it
>                 is only because they
>                 do not realize how complicated life is.
>                 John von
>                 Neumann<http://www-groups.dcs.st-and.ac.uk/~history/Biographies/Von_Neumann.html
>                 <http://www-groups.dcs.st-and.ac.uk/%7Ehistory/Biographies/Von_Neumann.html>>
>
>                         [[alternative HTML version deleted]]
>
>                 _______________________________________________
>                 Bioconductor mailing list
>                 Bioconductor at r-project.org
>                 <mailto:Bioconductor at r-project.org>
>                 https://stat.ethz.ch/mailman/listinfo/bioconductor
>                 Search the archives:
>                 http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
>
>         --
>         When you emerge in a few years, you can ask someone what you
>         missed, and you'll find it can be summed up in a few minutes.
>
>         Derek Sivers <http://sivers.org/berklee>
>
>
>
>
>
> --
> When you emerge in a few years, you can ask someone what you missed, and
> you'll find it can be summed up in a few minutes.
>
> Derek Sivers <http://sivers.org/berklee>
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list