[BioC] Pathview with non-KEGG organisms?

Luo Weijun luo_weijun at yahoo.com
Thu Aug 1 19:25:06 CEST 2013

Hi Iain,
I am attaching the KO gene set data you need for your analysis. This data was generated the same way as gene set data in gage and gageData packages. For other users with similar needs, I will also provide this KO gene set data in the development version of pathview or gageData package soon. For your analysis, you may do something like:

#load and check the ko gene set data
lapply(ko.sets[1:3], head)

#GAGE analysis
gage.res <- gage(your.data.log2, ref= 1, samp=2, gsets = ko.sets[sigmet.idx])

Here I did log2 transformation on your data as is commonly in array and NGS data analysis, although data in original scale will also give you sensible results most likely. I did not use all KO pathways, but the signaling and metabolite pathways (hence excluded the disease pathways which may be less relevant for your analysis). Note that I only have 1 control and 1 experiment sample in above analysis.
Once you get the significant pathway list, you may plug that with your.data.log2 into pathview for visualization.
The KEGG gene sets in gage package are for human. I also provided KEGG and GO for several major research species in gageData package. you may want to take a look there if you need to work on other species.

For viewing multiple experiments on KEGG pathway, I’ve been wondering whether I should provide that function in pathview. But I didn’t because it look quite messy no matter you divide each node into multiple pieces or do bar or line plots besides nodes. I may still provide that function in future release if I see enough interest in the user community. Currently, you may always generate one graph for each experiment condition.

On Tue, 7/30/13, Iain wrote:

Subject: Re: Pathview with non-KEGG organisms?

Date: Tuesday, July 30, 2013, 1:56 PM

Hey Weijun,

Thanks for the clarification on graphviz.

I'm trying to run gage right now, but am running into a
hurdles. I have my expression data in the proper format:

K00005 1584.06859 1595.09485 1437.64499
K00012  143.25284  239.21267  237.12022
K00013  222.57466  227.87069  104.46555
K00014   40.25286   28.87049   34.82185
K00018  268.74706  182.50277  192.34927
113.65515   77.33168   94.51645

However, I'm a little stuck with the gsets.  I
downloaded a KEGG gset
according to the instructions and got something like this:

55902 2645 5232 5230 5162 5160 5161 55276 7167 84532 2203
125 3099 126
3098 3101 127 5224 128 5223 124 230 501 92483 5313 160287
2023 5315
5214 669 5106 5105 219 217 218 10327 8789 5213 5211 3948
2597 2027
2026 441531 131 130 3945 220 221 222 223 224 130589 226 1738
1737 229
57818 3939 2538 5236 2821

... but now it seems like I have to map these number ids to
the KO ids
in my expression set? Is it not possible to use a similar
approach as
before (species="ko") to have gage simply recognize the
Otherwise, I'm not quite sure how to map KO ids to Entrez
IDS because
the id2eg function doesn't contain a KO id option.

With regard to the node attributes that I'd like...I'm not
sure. I have multiple experimental conditions to represent
on the
pathway (treatment 1, treatment 2, treatment 3). I'd like to
visually compare expression under each treatment on the
pathway. I
suppose this could be represented by bar graphs next to the
nodes, but
this will be messy. Another idea would be to scale edges
according to
their expression values. So have three different color
connecting nodes (one for each treatment) and scale line
according to expression. The problem with the current color
scale that
is it can only represent one value (a p-val, or a log fold
If I have 3 expression values (say for a gene, treatment 1 =
treatment 2 = 100, treatment 3 = 10000), I'm trying to think
of way to
compare these visually on the pathway.

Your help and comments are much appreciated.


> Iain,
> I agree, you will need gage or similar tools to
pinpoint the significantly perturbed pathways first. The
results can be easily piped into Pathview for automatic
visualization. With your input data ready, you may finish
the whole workflow in about 10 lines of code. Please check
the “Integrated workflow with pathway analysis” in page
15 of the Pathview vignette.
> It is always a good idea to keep molecular (gene,
compound) ID unique in your data as you’ve already done by
summing over the KO ids. GAGE (or similar pathway/gene-set
analysis tools) requires unique gene/molecule IDs for
sensible enrichment tests. In addition, R may force your
data IDs (names for vectors and rownames for matrix-like
objects) to be unique by adding suffices to your duplicated
> Graphviz view look quite different from KEGG view.
Graphviz view layout the pathway topology automatically,
users have little control over that. KEGG view uses the
native KEGG pathway graph, which was designed and drawn
fully by human. I am curious what types of node attributes
you want to manipulate? What do you mean by “plot actual
data next to nodes”, by using discrete legends rather than
color scale?
> Weijun
> --------------------------------------------
> On Fri, 7/26/13, Iain  wrote:
>  Subject: Re: Pathview with non-KEGG organisms?

>  Date: Friday, July 26, 2013, 7:16 PM
>  Hi Weijun,
>  Thanks for your email. I ended up summing over my
KO ids to
>  add
>  duplicates instead of using the mol.sum function
(which I
>  think does
>  the same thing). I did this because I had
instances where my
>  custom
>  ids had the same KO id. I got things working, but
I actually
>  think I
>  should start with your gage package first to try
to narrow
>  down what I
>  visualize.
>  Another quick question - is it possible to have
the graphviz
>  option
>  (kegg.native = F) maintain the general graph
structure that
>  kegg.native = T displays? I would like to access
>  functionality of
>  graphviz by being able to manipulate node
attributes, while
>  still
>  keeping the canonical flow of a metabolic
pathway. Also, is
>  it
>  possible to plot actual data next to nodes
instead of using
>  the color
>  scale to represent values?
>  Thanks again for your help,
>  Iain
>  On Fri, Jul 26, 2013 at 10:48 AM, Luo Weijun

>  wrote:
>  > Hi Iain,
>  > Yes, pathview can work with your problem.
First map
>  your genes to KEGG Orthology, and retrieve the
KEGG ortholog
>  IDs (gene IDs in the format of Kxxxxx) (as you
may have
>  done). Just label your genes use these KEGG
ortholog IDs
>  (instead of Entrez Gene IDs or gene symbols).
Then supply
>  your data as gene.data, and set species="ko" when
>  pathview function. Otherwise it would be the same
as working
>  with KEGG species data. Please check the help
info for
>  pathview function within R:
>  > ?pathview
>  > And look on the Arguments section
(gene.data, species)
>  and Details section.
>  >
>  > Pathview also can be used directly to
>  metagenomic or microbiome data when the data are
mapped to
>  KEGG ortholog IDs. In fact, pathview can
visualize various
>  types of molecular data as long as the data can
be mapped
>  onto pathways. Pathview automatically maps
>  gene/protein/compound IDs to KEGG molecular IDs
for common
>  species. For less used IDs or other species,
pathview will
>  also work if the user provides the ID mapping
>  Please check page 13-14 in the package vignette
>  pathview’s ID mapping functions and solutions.
>  > Weijun
>  >
>  >
>  > On Fri, 7/26/13, Iain
>  wrote:
>  >
>  >  Subject: Pathview with non-KEGG

>  >  Date: Friday, July 26, 2013, 1:42 AM
>  >
>  >  Hey Weijun,
>  >
>  >  I've
>  >  been
>  >  looking for tools that allow RNA-seq
data to be
>  overlaid on
>  >  KEGG
>  >  pathways. The problem is that the
bacterium I
>  work on is not
>  >  a KEGG
>  >  organism. I have a draft genome and I
have used
>  KASS to find
>  >  KEGG
>  >  Orthology assignments for each of the
genes. Is
>  it possible,
>  >  somehow,
>  >  to still use Pathview? For example,
instead of
>  calling the
>  >  pathview
>  >  function with species = "hsa", would
it be
>  possible to
>  >  provide a
>  >  custom set of KO assignments?
>  >
[[elided Yahoo spam]]
>  >
>  >  Cheers,
>  >  Iain
>  >

More information about the Bioconductor mailing list