[BioC] Help with Gviz \"IdeogramTrack\" and \"BioMartGeneTrackRegion\" commands

Hahne, Florian florian.hahne at novartis.com
Mon Feb 11 14:48:56 CET 2013


Hi Marc,
thanks for the hint, but I think this is not quite what I need. My problem
is still on the level of genomes. UCSC for instance calls a particular
version of they human genome hg19. Now there exists a similar genome in
Ensembl, however they do not use the same name for it (GRCh37.p10). I made
the maybe somewhat unwise attempt early on to identify genomes within Gviz
by their UCSC name and to translate those names into Ensembl names if
necessary. In hind sight this may not have been the smartest decision, and
I should have left the translation job completely to the user. If somebody
wants  Ensebml gene models from BiomaRt they should make sure that they
select the correct mart and dataset in the first place.
I'll think about a pragmatic way out of this hole I've dug myself into.
Florian
-- 






On 1/23/13 2:18 AM, "Marc Carlson" <mcarlson at fhcrc.org> wrote:

>Hi Florian,
>
>We actually have a small database called seqnames.db that is dedicated
>to tracking these kinds of chromosome name conventions.  You can see
>more by looking at the help page for supportedSeqnameStyles() (and it's
>friends).  A quick way to see that is:
>
>library(Homo.sapiens)
>?supportedSeqnameStyles
>
>
>If you call the supportedSeqnameStyles() method, you will see that we
>don't (yet) have an entry for zebrafish. If you were to give me one as a
>tab file, I could add it to the database and it would therefore exist
>for the future...  The file I need is deliberately simple to make.  It
>should look like the example below, with as many columns as you want
>there to be styles for, and each column separated by a tab.
>
>NCBI    MSU6
>1       1
>2       2
>3       3
>4       4
>
>etc.
>
>
>   Marc
>
>
>
>
>
>On 01/21/2013 09:15 AM, Hahne, Florian wrote:
>> Hi Joseph,
>>
>> Regarding your first problem: UCSC has no cytoband information for any
>>of
>> the zebrafish genomes, and that's what is throwing the error. I think it
>> should do something smarter, e.g. use the chromosome length information
>> that should be available for every UCSC genome to draw at least a blank
>> ideogram which could still be used to indicate the current plotting
>> position. I'll have this ready in the next release of the package, and
>> maybe even port this back to the current release. It seems to be more
>>of a
>> bug than a missing featureŠ
>>
>> Your second problem is a bit more tricky. There is no real mapping
>>between
>> the ensembl genome names used in the Biomart package and the UCSC ones
>> which I decided to take as the defaults for the package. I tried to come
>> up with my own static mapping for this, and obviously this means that
>> things tend to get out of date soon. Now the zebrafish version that you
>> will get in Ensembl is Zv9 (which is equivalent to danRer7), but my
>> mapping is still to danRer6. This is even wrong, because what you will
>>get
>> from Biomart if you ask for danRer6 now is actually danRer7. Yikes. I
>>will
>> have to come up with a better solution for this. There should be a way
>>to
>> explicitly control for the Ensembl genome that you will get, and this
>>is a
>> simple change. Getting it right automagically is way more challenging, I
>> am afraid.
>>
>> As a quick fix for you:
>> Ask for the danRer6 genes and manually change the genome of the track:
>> 
>>biomTrack<-BiomartGeneRegionTrack(genome="danRer6",chromosome=1,start=1e6
>>,e
>> nd=1e6+10000,name="ENSEMBL",showId=T)
>> genome(biomTrack)<- "danRer7"
>>
>> I'll get back to you once I have a better solution.
>>
>> Florian
>>
>>
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at r-project.org
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list