[BioC] custom SNPlocs / BSgenome package

Hervé Pagès hpages at fhcrc.org
Fri Dec 2 22:52:21 CET 2011


Hi kmo,

On 11-12-01 06:14 AM, km wrote:
> Hi,
>
> On Mon, Nov 28, 2011 at 8:00 AM, Valerie Obenchain<vobencha at fhcrc.org>wrote:
>
>> Hi kmo,
>>
>> For the BSgenome, you can follow the "How to forge a BSgenome data
>> package" document,
>>
>>     http://bioconductor.org/**packages/2.10/bioc/html/**BSgenome.html<http://bioconductor.org/packages/2.10/bioc/html/BSgenome.html>
>>
>> I'm not sure what you mean by "a SNPlocs package for it". The SNPlocs
>> packages are all Homo sapiens and created from each dbSNP build. They can
>> be found here,
>>
>>     http://www.bioconductor.org/**packages/release/data/**annotation/<http://www.bioconductor.org/packages/release/data/annotation/>
>>
>> Are you asking about creating a snp annotation package for a different
>> organism?
>>
>>
> Yes how do I include  snp annotation package of a different organism ?

AFAIK, it's the first time I hear someone asking about non-Human
SNPs on this list.

The scripts used for making the SNPlocs packages are actually bundled in
each package. They are in the inst/tools/ folder (see for example the
SNPlocs.Hsapiens.dbSNP.20110815 package). Here you'll find a mix of R
scripts, shell scripts and a C program (compile the C program with
'gcc -Wall filter2_ds_flat.c -o filter2_ds_flat'). Those scripts are
used in a "pipe" fashion i.e. the output produced by one script is the
input of the next script.

There's also a README.TXT file in this folder that explains how this
pipe was used to generate the package. Because the pipe uses shell
scripts, you'll need to be on a Linux machine (maybe Mac would work
too, I didn't try).

As Val said, we only make SNPlocs packages for Human at the moment.
We use the "flat" files provided by dbSNP for this. The pipe in
inst/tools/ only supports those flat files. For Human those files
are huge and hard to parse, this is why the "pipe" uses several
steps and different kinds of scripts (some steps are easier
to implement in a particular language). That means you'll also
need a machine with enough memory (can't remember how much exactly
is needed for Human, maybe 8GB).

So if your organism is supported by dbSNP and if they provide flat
files for it, then it should not be too hard to adapt the pipe to
make it work for you.

To make things a little bit more complicated though, the way the
data is stored in a SNPlocs package has changed recently but the pipe
has not been modified yet (this is on my list): it still generates
the data in the old way. There is an extra script in inst/tools/
(update_SNPlocs_data.R) that you will need to run after you've run
the pipe and that will convert the data to the new way.

Let me know if you have any question.

Cheers,
H.


> Thanks
> kmo
>
>>
>>
>>
>> On 11/26/11 11:03, km wrote:
>>
>>> Hi all,
>>> How do I make a custom BSGenome package for a given genome and SNPlocs
>>> package for it ?
>>> I appreciate any pointers.
>>> Thanks,
>>> Regards,
>>> kmo
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________**_________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>> Search the archives: http://news.gmane.org/gmane.**
>>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>
>>
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list