[BioC] Pathview with non-KEGG organism

Mon Sep 16 17:48:11 CEST 2013

Christian,
You’ve done the gene ID mapping to KO correctly. To proceed with the GAGE pathway analysis, you will need the KO gene set data (which I will send you next). The KO gene set data will be provided in the next release of gageData package too.
To see whether KEGG includes your research species, you may check:
library(pathview)
data(korg)
head(korg)
If it is included, you don’t really have to map your gene ID to KO given that you can get the corresponding gene set data.

As you have multiple samples/replicates, you may choose to visualize the average gene expression of all samples together or each individual sample separately using Pathview. Pathview will also be able to integrate/plot multiple states/samples on the same graph by splitting each node, from next devel release (version 1.17): http://bioconductor.org/packages/devel/bioc/html/pathview.html. So stay tunned.
HTH.
Weijun

--------------------------------------------
On Mon, 9/16/13, Christian De Santis <christian.desantis at stir.ac.uk> wrote:

 Subject: Pathview with non-KEGG organism
 To: "'bioconductor at r-project.org'" <bioconductor at r-project.org>

 Date: Monday, September 16, 2013, 4:44 AM

 Hi Wejun,

 I am new to BIOC and Pathview/Gage packages. I am
 analysing microarray data from an experiment on Atlantic
 salmon and I am attempting to visualize the results in
 Pathview, if possible. 

 Following up a previous thread (https://stat.ethz.ch/pipermail/bioconductor/2013-August/054161.html),
 I have been trying to do a similar
 thing and I believe I have similar limitation. As for the
 previous user, I have obtained KEGG Orthology annotation
 using KAAS. Briefly, the principal steps of my workflow look
 like the following:   

 >
 DIET12_14_KO <-
 read.csv("DIET12_14_KO.csv",header=T,
 sep=",") # Upload the KEGG annotation file from
 KAAS
 >
 DIET12_14_KO[1:3,]

 ProbeName     KO
 1 Omy#AB024321
 K04079
 2 Omy#BG360545
 K13506
 3 Omy#BX072887
 K00412
 >
 MAlist[1:3,1:6] # Visualize my expression
 list

 DIET14    DIET14.1  
 DIET14.2   DIET14.3     
 DIET02    DIET02.1
 Omy#AB024321 
 0.06296557  0.08865075  0.1186315 -0.1847021
 -0.41212414 -0.42385673
 Omy#BG360545 -0.50762181
 -0.35763304 -0.4939668 -0.6973216 -0.11339368 
 0.15489712
 Omy#BX072887 
 0.23447458  0.22487856  0.3930821  0.1515031
 -0.04694996 -0.04836203
 >
 dim(MAlist)
 [1] 7955  
 16
 >
 D2 <- as.matrix(DIET12_14_KO) # create the two column
 character matrix for id.map argument
 >
 D2[1:3,]

 ProbeName     
 KO      
 [1,]
 "Omy#AB024321"
 "K04079"
 [2,]
 "Omy#BG360545"
 "K13506"
 [3,]
 "Omy#BX072887"
 "K00412"
 >
 gene.data <- mol.sum(MAlist, id.map =
 D2)
 >
 gene.data [1:3,1:6]

 DIET14    
 DIET14    
 DIET14    
 DIET14      
 DIET02     DIET02
 K00006 
 0.7170382  0.5351467  0.1207924 
 0.1782242  0.228860514 -0.5426538
 K00008 -0.8112601
 -0.5910453 -0.7691811 -0.1919992 -0.003848065 
 0.1771637
 K00011 
 1.9645823  1.2305297  2.3335377 
 1.4813718  0.185036373 -1.2886788
 >
 dim(gene.data)
 [1] 2449  
 16

 I am a bit stuck here. I should now have the data in
 the correct format for the pathview argument
 “gene.data” with genes as row and samples as
 column and KO ids as row names. From my understanding, to
 proceed I will now need a KO gene set data for non-model
 species? Or could I use one from a close species like
 zebrafish? 

 Also, one thing I have not clear is if the gene.data
 should include the expression values of all sample (i.e.
 biological replicates) or the average value per
 treatment.

 Your help will be very much appreciated.

 Regards,

 Christian  

 The University
 of Stirling has been ranked in the top 12 of UK universities
 for graduate employment*.
 94% of
 our 2012 graduates were in work and/or further study within
 six months of graduation.
 *The
 Telegraph
 The University of
 Stirling is a charity registered in Scotland, number SC
 011159.