[BioC] Ggallus BS genome package

Hervé Pagès hpages at fhcrc.org
Wed May 18 04:19:28 CEST 2011


Hi Namyoung, Sean,

On 11-05-16 01:49 PM, Sean MacEachern wrote:
> Hi,
>
> I've never used them, so I may be incorrect, but I believe they are
> the corresponding 1000, 2000 or 5000 bases upstream of a particular
> feature on the genome (eg a probe or a gene), these may be useful for
> identifying potential regulatory regions or designing custom primers
> etc..

Correct. Like the chromosome sequences, the "upstream" sequences
were downloaded from here

   http://hgdownload.cse.ucsc.edu/goldenPath/galGal3/bigZips/

at the time the BSgenome.Ggallus.UCSC.galGal3 was made (i.e. a long
time ago). There is some information at the above URL about these
upstream sequences. Note that UCSC updates those sequences weekly
(they are based on RefSeq genes) so the sequences in the BSgenome
data package are probably very outdated (I'm talking about the
"upstream" sequences here, the chromosome sequences for galGal3
will never change).

Including these upstream sequences (when they are available) is
something we've been doing for a very long time, as a convenience,
for many organisms. But those days if you want something
accurate/up-to-date you should consider retrieving the latest RefSeq
gene/transcript/exon locations with the makeTranscriptDbFromUCSC()
function (from the GenomicFeatures package), compute the genomic
ranges corresponding to the upstream regions (in a GRanges object),
and then use the getSeq() function from the BSgenome package to
extract the upstream sequences.

Cheers,
H.

>
> Cheers,
>
> Sean
>
> On Mon, May 16, 2011 at 3:22 PM, Namyoung Jung<jnamyoung at gmail.com>  wrote:
>> Hi, Sean
>>
>> I have one more question.
>> Could you explain what the multiple sequences labeled as 'upstream1000,
>> upstream2000, upstream5000' in the galGla3 assembly?
>> Thanks.
>>
>>
>> 2011/5/4 Sean MacEachern<sean.maceach at gmail.com>
>>>
>>> Hi Namyoung,
>>>
>>> unfortunately the Chicken genome is not as complete as many of the
>>> other mammalian genomes. There is a new release that is due to come
>>> out later this year, but that most likely will still be missing many
>>> of the micro chromosomes. There is hope that when single molecule
>>> sequencing is reliably running that this may improve things.
>>>
>>> Sean
>>>
>>> On Tue, May 3, 2011 at 1:26 PM, Namyoung Jung<jnamyoung at gmail.com>  wrote:
>>>> Hi,
>>>> I'm a graduate student at Hopkns and working with chicken BSgenome
>>>> Has anyone noticed there are some missing chromosome like 21-28 in the
>>>> package?
>>>> If anyone can explian this , it'll be helpful for my project.
>>>> Thanks.
>>>>
>>>> All the best,
>>>> --
>>>> Namyoung Jung
>>>> PhD candidate, Department of Cellular and Molecular Medicine
>>>> Johns Hopkins School of Medicine, Baltimore, MD, 21205
>>>> njung1 at jhmi.edu
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>
>>
>>
>> --
>> Namyoung Jung
>> PhD candidate, Department of Cellular and Molecular Medicine
>> Johns Hopkins School of Medicine, Baltimore, MD, 21205
>> njung1 at jhmi.edu
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list