[BioC] Pathview with non-KEGG organisms?
luo_weijun at yahoo.com
Thu Aug 1 19:25:06 CEST 2013
I am attaching the KO gene set data you need for your analysis. This data was generated the same way as gene set data in gage and gageData packages. For other users with similar needs, I will also provide this KO gene set data in the development version of pathview or gageData package soon. For your analysis, you may do something like:
#load and check the ko gene set data
gage.res <- gage(your.data.log2, ref= 1, samp=2, gsets = ko.sets[sigmet.idx])
Here I did log2 transformation on your data as is commonly in array and NGS data analysis, although data in original scale will also give you sensible results most likely. I did not use all KO pathways, but the signaling and metabolite pathways (hence excluded the disease pathways which may be less relevant for your analysis). Note that I only have 1 control and 1 experiment sample in above analysis.
Once you get the significant pathway list, you may plug that with your.data.log2 into pathview for visualization.
The KEGG gene sets in gage package are for human. I also provided KEGG and GO for several major research species in gageData package. you may want to take a look there if you need to work on other species.
For viewing multiple experiments on KEGG pathway, I’ve been wondering whether I should provide that function in pathview. But I didn’t because it look quite messy no matter you divide each node into multiple pieces or do bar or line plots besides nodes. I may still provide that function in future release if I see enough interest in the user community. Currently, you may always generate one graph for each experiment condition.
On Tue, 7/30/13, Iain wrote:
Subject: Re: Pathview with non-KEGG organisms?
Date: Tuesday, July 30, 2013, 1:56 PM
Thanks for the clarification on graphviz.
I'm trying to run gage right now, but am running into a
hurdles. I have my expression data in the proper format:
K00005 1584.06859 1595.09485 1437.64499
K00012 143.25284 239.21267 237.12022
K00013 222.57466 227.87069 104.46555
K00014 40.25286 28.87049 34.82185
K00018 268.74706 182.50277 192.34927
113.65515 77.33168 94.51645
However, I'm a little stuck with the gsets. I
downloaded a KEGG gset
according to the instructions and got something like this:
55902 2645 5232 5230 5162 5160 5161 55276 7167 84532 2203
125 3099 126
3098 3101 127 5224 128 5223 124 230 501 92483 5313 160287
5214 669 5106 5105 219 217 218 10327 8789 5213 5211 3948
2026 441531 131 130 3945 220 221 222 223 224 130589 226 1738
57818 3939 2538 5236 2821
... but now it seems like I have to map these number ids to
the KO ids
in my expression set? Is it not possible to use a similar
before (species="ko") to have gage simply recognize the
Otherwise, I'm not quite sure how to map KO ids to Entrez
the id2eg function doesn't contain a KO id option.
With regard to the node attributes that I'd like...I'm not
sure. I have multiple experimental conditions to represent
pathway (treatment 1, treatment 2, treatment 3). I'd like to
visually compare expression under each treatment on the
suppose this could be represented by bar graphs next to the
this will be messy. Another idea would be to scale edges
their expression values. So have three different color
connecting nodes (one for each treatment) and scale line
according to expression. The problem with the current color
is it can only represent one value (a p-val, or a log fold
If I have 3 expression values (say for a gene, treatment 1 =
treatment 2 = 100, treatment 3 = 10000), I'm trying to think
of way to
compare these visually on the pathway.
Your help and comments are much appreciated.
> I agree, you will need gage or similar tools to
pinpoint the significantly perturbed pathways first. The
results can be easily piped into Pathview for automatic
visualization. With your input data ready, you may finish
the whole workflow in about 10 lines of code. Please check
the “Integrated workflow with pathway analysis” in page
15 of the Pathview vignette.
> It is always a good idea to keep molecular (gene,
compound) ID unique in your data as you’ve already done by
summing over the KO ids. GAGE (or similar pathway/gene-set
analysis tools) requires unique gene/molecule IDs for
sensible enrichment tests. In addition, R may force your
data IDs (names for vectors and rownames for matrix-like
objects) to be unique by adding suffices to your duplicated
> Graphviz view look quite different from KEGG view.
Graphviz view layout the pathway topology automatically,
users have little control over that. KEGG view uses the
native KEGG pathway graph, which was designed and drawn
fully by human. I am curious what types of node attributes
you want to manipulate? What do you mean by “plot actual
data next to nodes”, by using discrete legends rather than
> On Fri, 7/26/13, Iain wrote:
> Subject: Re: Pathview with non-KEGG organisms?
> Date: Friday, July 26, 2013, 7:16 PM
> Hi Weijun,
> Thanks for your email. I ended up summing over my
KO ids to
> duplicates instead of using the mol.sum function
> think does
> the same thing). I did this because I had
instances where my
> ids had the same KO id. I got things working, but
> think I
> should start with your gage package first to try
> down what I
> Another quick question - is it possible to have
> (kegg.native = F) maintain the general graph
> kegg.native = T displays? I would like to access
> functionality of
> graphviz by being able to manipulate node
> keeping the canonical flow of a metabolic
pathway. Also, is
> possible to plot actual data next to nodes
instead of using
> the color
> scale to represent values?
> Thanks again for your help,
> On Fri, Jul 26, 2013 at 10:48 AM, Luo Weijun
> > Hi Iain,
> > Yes, pathview can work with your problem.
> your genes to KEGG Orthology, and retrieve the
> IDs (gene IDs in the format of Kxxxxx) (as you
> done). Just label your genes use these KEGG
> (instead of Entrez Gene IDs or gene symbols).
> your data as gene.data, and set species="ko" when
> pathview function. Otherwise it would be the same
> with KEGG species data. Please check the help
> pathview function within R:
> > ?pathview
> > And look on the Arguments section
> and Details section.
> > Pathview also can be used directly to
> metagenomic or microbiome data when the data are
> KEGG ortholog IDs. In fact, pathview can
> types of molecular data as long as the data can
> onto pathways. Pathview automatically maps
> gene/protein/compound IDs to KEGG molecular IDs
> species. For less used IDs or other species,
> also work if the user provides the ID mapping
> Please check page 13-14 in the package vignette
> pathview’s ID mapping functions and solutions.
> > Weijun
> > On Fri, 7/26/13, Iain
> > Subject: Pathview with non-KEGG
> > Date: Friday, July 26, 2013, 1:42 AM
> > Hey Weijun,
> > I've
> > been
> > looking for tools that allow RNA-seq
data to be
> overlaid on
> > KEGG
> > pathways. The problem is that the
> work on is not
> > a KEGG
> > organism. I have a draft genome and I
> KASS to find
> > KEGG
> > Orthology assignments for each of the
> it possible,
> > somehow,
> > to still use Pathview? For example,
> calling the
> > pathview
> > function with species = "hsa", would
> possible to
> > provide a
> > custom set of KO assignments?
[[elided Yahoo spam]]
> > Cheers,
> > Iain
More information about the Bioconductor