[BioC] How to plot gene on their chromosome?

Martin Morgan mtmorgan at fhcrc.org
Wed May 27 18:50:51 CEST 2009


Simon Noël <simon.noel.2 at ulaval.ca> writes:

> Whell, now I got the result I wanted.  In R, I use
>
>>library(org.Hs.eg.db)
>
>>mysymbols <- scan("./Gene_list.csv", what = 'character', skip =1)
>>myEgIDs <- unlist(mget(mysymbols, org.Hs.egSYMBOL2EG))
>>write.table(myEgIDs,file="myEgIDs.xls")
>
> After, I use my .xls with the program stripe.

Hi Simon --

You say the summer has just begun, so... Here's something that might
be fun to work with...

I use the lattice package to create the plot. I load the package,
define two 'helper' functions, and then a wrapper to do what I want...

  library(lattice)
  prepanel.idio <-
      function(chrs, ..., xlim)
  {
      if (missing(xlim))
          list(xlim=c(1, max(chrs[["Length"]])))
      else
          list(xlim=xlim)
  }

  panel.idio <-
      function(x, y, chrs, ..., gtick=.2, colp="blue", colm="red")
  {
      ## helpers: y coordinates at integer values
      yy <- unique(y)
      o <- order(yy)
      yi <- match(y, yy[o])
      ## draw 'chromosomes'
      with(chrs, {
          idx <- match(yy, Chromosome)
          panel.segments(1, seq_along(yy), Length[idx][o], seq_along(yy))
      })
      ## plot genes
      panel.segments(abs(x), yi, abs(x),
                     ifelse(x<0, yi-gtick, yi+gtick),
                     ..., col=ifelse(x<0, colm, colp))
}

  idioplot <-
      function(chrs, ..., prepanel=prepanel.idio, panel=panel.idio)
  {
      xyplot(..., prepanel=prepanel, panel=panel, chrs=chrs)
  }

Then I get the data needed to do the plots

  library(org.Hs.eg.db)
  sym <- c("ACSL1", "ACTG2", "ADAMTSL2", "ADH1C",
           "ADORA1", "ADRA2A", "AHNAK", "AIM2", "ALAS2")
  id <- unlist(mget(sym, org.Hs.egSYMBOL2EG))
  chrloc <- mget(id, org.Hs.egCHRLOC)
  chr <- unlist(lapply(chrloc, names), use.names=FALSE)

and form it into the data structures needed to use my functions

  genes <- data.frame(Id=rep(id, sapply(chrloc, length)),
                      Chromosome=factor(chr,
                        levels=c(1:22, "X", "Y", "M")),
                      Position=unlist(chrloc, use.names=FALSE))
  chrs <- data.frame(Chromosome=factor(
                       names(org.Hs.egCHRLENGTHS),
                       levels=c(1:22, "X", "Y", "M")),
                     Length=org.Hs.egCHRLENGTHS)

Finally I draw the plot

  idio <- idioplot(chrs, Chromosome~Position, genes)
  show(idio)

I should have access to all the regular plot formating parameters (see
?xyplot, ?panel.xyplot, ?segments, ?panel.segments), in addition to
changing how long the gene markings are, and the colors of the
segments for each gene. You even have 'zoom' capabilities

  idioplot(chrs, Chromosome~Position, genes[genes$Chromosome=="11", ],
           xlim=c(61000000, 63000000))

or, a little more hack-y (I don't know how to update the y limits when
y is a factor)

  update(idio, ylim=c(5.5,6.5), xlim=c(61000000, 63000000))

More generally you might look at rtracklayer (for exporting to the
UCSC genome browser), GenomeGraph (for plotting gene-level
information, for instance), geneplotter, idiogram, ...

Martin

> Thans's you for your help.
>
> Does anyone have an idea about my problem with qc()?
>
> Selon Simon Noël <simon.noel.2 at ulaval.ca>, 23.05.2009:
>
>> I have try your procedure and every thing is loking fine for now.  What's the
>> netx step to plot my genes on their chromosomes?
>>
>> Selon Simon Noël <simon.noel.2 at ulaval.ca>, 21.05.2009:
>>
>> > I am sorry.  Has I say, I am new to R and bioconductor.  I am still
>> learning.
>> >
>> > I will try this tomorow and I will write you back to tell you the result.
>> > Thank's you for your help.  Il it's working, I will only have to wory about
>> > my
>> > problem with qc()...  Or at least for now.  The summer is really young;)
>> > Selon Hervé Pagès <hpages at fhcrc.org>, 21.05.2009:
>> >
>> > > Hi Simon,
>> > >
>> > > Not a good idea to start a new thread by replying to a different thread
>> > > you started previously. Then it shows up under the previous thread even
>> > > if you changed the subject.
>> > >
>> > > more below...
>> > >
>> > > Simon Noël wrote:
>> > > > Hello every one.  I have a question.  I have a gene list in a .xls like
>> > > >
>> > > > probeID	Symbol
>> > > > 1030431	ACSL1
>> > > > 4610431	ACTG2
>> > > > 4810575	ADAMTSL2
>> > > > 1510750	ADH1C
>> > > > 4060519	ADORA1
>> > > > 5720523	ADRA2A
>> > > > 2810482	AHNAK
>> > > > 1260270	AIM2
>> > > > 4180768	ALAS2
>> > > > ...     ...
>> > > >
>> > > > I want to plote all of those genes on their chromosome.  How can I do
>> > this?
>> > >
>> > > So first you need to map each gene to its chromosome location.
>> > >
>> > > You can use one of the org.*.eg.db annotation packages for
>> > > this (pick up the one for your organism):
>> > >
>> > >    http://bioconductor.org/packages/release/data/annotation/
>> > >
>> > > and use the SYMBOL2EG map to map your gene symbols to their corresponding
>> > > Entrez IDs and then the CHRLOC map to map your Entrez IDs to their
>> > chromosome
>> > > locations.
>> > >
>> > > Example:
>> > >
>> > >    library(org.Hs.eg.db)
>> > >    mysymbols <- c("ACSL1", "ACTG2", "ADAMTSL2", "ADH1C",
>> > >                   "ADORA1", "ADRA2A", "AHNAK", "AIM2", "ALAS2")
>> > >    myEgIDs <- unlist(mget(mysymbols, org.Hs.egSYMBOL2EG))
>> > >    mylocs <- unname(unlist(mget(myEgIDs, org.Hs.egCHRLOC)))
>> > >
>> > > One thing to be aware of is that those mappings are not necessarily
>> > > one-to-one e.g. the same symbol can be associated with different genes:
>> > >
>> > >    > flat <- toTable(org.Hs.egSYMBOL2EG)
>> > >    > names(flat)
>> > >    [1] "gene_id" "symbol"
>> > >    > any(duplicated(flat$gene_id))
>> > >    [1] FALSE
>> > >    > any(duplicated(flat$symbol))
>> > >    [1] TRUE
>> > >
>> > > The same thing happens with the org.Hs.egCHRLOC map (I'm not sure
>> > > why we have this though, may be others on the list can explain).
>> > >
>> > > Anyway this explains why 'mylocs' can have more elements than
>> 'mysymbols'.
>> > >
>> > > Cheers,
>> > > H.
>> > >
>> > > >
>> > > > Simon Noël
>> > > > VP Externe CADEUL
>> > > > Association des étudiants et étudiantes en Biochimie, Bio-
>> > > > informatique et Microbiologie de l'Université Laval
>> > > > CdeC
>> > > >
>> > > > _______________________________________________
>> > > > Bioconductor mailing list
>> > > > Bioconductor at stat.math.ethz.ch
>> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > > > Search the archives:
>> > > http://news.gmane.org/gmane.science.biology.informatics.conductor
>> > >
>> > > --
>> > > Hervé Pagès
>> > >
>> > > Program in Computational Biology
>> > > Division of Public Health Sciences
>> > > Fred Hutchinson Cancer Research Center
>> > > 1100 Fairview Ave. N, M2-B876
>> > > P.O. Box 19024
>> > > Seattle, WA 98109-1024
>> > >
>> > > E-mail: hpages at fhcrc.org
>> > > Phone:  (206) 667-5791
>> > > Fax:    (206) 667-1319
>> > >
>> > >
>> >
>> >
>> > Simon Noël
>> > VP Externe CADEUL
>> > Association des étudiants et étudiantes en Biochimie, Bio-
>> > informatique et Microbiologie de l'Université Laval
>> > CdeC
>>
>>
>> Simon Noël
>> VP Externe CADEUL
>> Association des étudiants et étudiantes en Biochimie, Bio-
>> informatique et Microbiologie de l'Université Laval
>> CdeC
>
>
> Simon Noël
> VP Externe CADEUL
> Association des étudiants et étudiantes en Biochimie, Bio-
> informatique et Microbiologie de l'Université Laval
> CdeC
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list