[BioC] GAGE/Pathview analysis data preparation

Wed Aug 13 22:45:38 CEST 2014

Nick,
Assume I understand you correctly, and rows and columns in you data matrix are genes and samples correspondingly, and the data are log2 transformed. Say you have column indecies like:
ref1=1:4
samp1=5:8
samp1.1=9:11
…

If you want to do two-state comparison in pathview plots, you may derive the differential expression (log2 ratios) like:
#paired data
samp1.d=exp.mat[, samp1]-exp.mat[,ref1]
#unpaired data
samp1.d=exp.mat[, samp1]-rowMeans(exp.mat[,ref1])
samp1.1.d=exp.mat[, samp1.1]-rowMeans(exp.mat[,ref1])

If you want to do multiple-state comparison in pathview plots, you may derive the differential expression (log2 ratios) like:
samp.d=cbind(exp.mat[,c(samp1,samp1.1)]-rowMeans(exp.mat[,ref1]), exp.mat[,c(samp2,samp2.1)]-rowMeans(exp.mat[,ref2]))

BTW, the latest versions are pathview 1.4.2 and gage 2.14.4, where you can find all updated features including go.gsets function:
library(gage)
?go.gsets

Package links:
http://bioconductor.org/packages/release/bioc/html/gage.html
http://bioconductor.org/packages/release/bioc/html/pathview.html

HTHs.
Weijun

--------------------------------------------
On Fri, 8/8/14, Nick wrote:

 Hello sir, 
 As a new user i was wondering about
 few things listed below.
 1) I am using mogene20st arrays and to prepare
 the data with gage function, this function required gsets
 and i used kegg.sets.mm and i hope this
 is the correct gsets i used for mouse genesets , and for GO
 terms i used go.sets.mm. both located in
 gageDATA package.  If you have another/specific option,
 particular for this chip please do let me know. code for
 e.g. is below

 data1.kegg.p.egid <- gage(raw exp
 signals.egid, gset= kegg.sets.mm, ref= ref1, samp
 = samp1)

 2)  My expression data consist of multiple
 arrays for e.g. ref 1* 4arrays, sample 1*4arrays and sample
 1.1*3arrays ; ref 2*4arrays, sample 2*4arrays  and sample
 2.1*4arrays  (means all except sample 1.1 are 4 arrays) ,
 so total 23 arrays. 

 Now i want to prepare my data sets for kegg
 pathway and i followed General applicable gene sets/Pathway
 analysis, but on page 18 it says i need to supply expression
 changes and target pathway. Target pathway is not a problem
 but i have difficulties in preparing my data for pathview.

 First, i was wondering how it will calculate the
 significantly expressed gene/s with only raw expression
 values for pathway.Secondly, based on the above
 information how it will plot the kegg pathway. 

 Technical issue:- i have 23 arrays and so i tried
 to minus my ref 1  from sample 1, 1.1 and ref 2 from sample
 2, 2,1 respectively. but i get an error 
 given code: gse16873.d <- gse16783[, dcis] -
 gse16873 [,hn]  starting data sets for kegg pathway
 analysis as i understood so far.  

 mycode:signals.egid.dataforkegg
 <- signals.egid[,c(ED8,M8,ED12,M12)] -
 signals.egid[,c(con8,con8,con12,con12)]Error in
 Ops.data.frame(signals.egid[, c(ED8, M8, ED12, M12)],
 signals.egid[,  : 

   - only defined for equally-sized data
 frames
 OR
 > signals.egid.dataforkegg <-
 gagePrep(signals.egid, ref= c(con8,con8,con12, con12), samp=
 c(ED8, M8, ED12, M12))

 Error in gagePrep(signals.egid, ref = c(con8, con8,
 con12, con12), samp = c(ED8,  :   please make
 sure 'ref' and 'samp' are comparable and of
 equal length or compare='unpaired'

 OR
 signals.egid.dataforkegg <-
 gagePrep(signals.egid, compare=
 'unpaired')Error in
 gagePrep(signals.egid, compare = "unpaired")
 : 

   improper 'compare' argument
  value
 as i understood so far the error could be because
 of only 3 arrays in sample 1.1 and rest are 4 each. can you
 please advice how should i prepare my data for pathway
 analysis with pathview. 

 3) there are more functions in gage package such
 as gagePrep, gagePipe ; i would like to know how relevant is
 the output from this functions for pathview. as i look at
 the pathview manual quick start with demo data the starting
 data is gse16873.d which is an output of code given on page
 21 of pathview manual. does results from gagePrep it gives
 the same results as subtracting ref from sample?

 gse16873.d <- gagePrep(gse16873, gsets= kegg.gs, ref= hn, samp=
 dcis)
 head(gse16873.d[1:2,])

  DCIS_1     DCIS_2      DCIS_3    
  DCIS_4       DCIS_5     DCIS_6

 10000 -0.3076448 -0.1472277 -0.02378481 -0.07056193
 -0.001323087 -0.150268110001  0.4158680
 -0.3347726 -0.51313691 -0.16653712  0.111122223
  0.1340073