[BioC] GAGE/Pathview analysis data preparation
Luo Weijun
luo_weijun at yahoo.com
Wed Aug 13 22:45:38 CEST 2014
Nick,
Assume I understand you correctly, and rows and columns in you data matrix are genes and samples correspondingly, and the data are log2 transformed. Say you have column indecies like:
ref1=1:4
samp1=5:8
samp1.1=9:11
…
If you want to do two-state comparison in pathview plots, you may derive the differential expression (log2 ratios) like:
#paired data
samp1.d=exp.mat[, samp1]-exp.mat[,ref1]
#unpaired data
samp1.d=exp.mat[, samp1]-rowMeans(exp.mat[,ref1])
samp1.1.d=exp.mat[, samp1.1]-rowMeans(exp.mat[,ref1])
If you want to do multiple-state comparison in pathview plots, you may derive the differential expression (log2 ratios) like:
samp.d=cbind(exp.mat[,c(samp1,samp1.1)]-rowMeans(exp.mat[,ref1]), exp.mat[,c(samp2,samp2.1)]-rowMeans(exp.mat[,ref2]))
BTW, the latest versions are pathview 1.4.2 and gage 2.14.4, where you can find all updated features including go.gsets function:
library(gage)
?go.gsets
Package links:
http://bioconductor.org/packages/release/bioc/html/gage.html
http://bioconductor.org/packages/release/bioc/html/pathview.html
HTHs.
Weijun
--------------------------------------------
On Fri, 8/8/14, Nick wrote:
Hello sir,
As a new user i was wondering about
few things listed below.
1) I am using mogene20st arrays and to prepare
the data with gage function, this function required gsets
and i used kegg.sets.mm and i hope this
is the correct gsets i used for mouse genesets , and for GO
terms i used go.sets.mm. both located in
gageDATA package. If you have another/specific option,
particular for this chip please do let me know. code for
e.g. is below
data1.kegg.p.egid <- gage(raw exp
signals.egid, gset= kegg.sets.mm, ref= ref1, samp
= samp1)
2) My expression data consist of multiple
arrays for e.g. ref 1* 4arrays, sample 1*4arrays and sample
1.1*3arrays ; ref 2*4arrays, sample 2*4arrays and sample
2.1*4arrays (means all except sample 1.1 are 4 arrays) ,
so total 23 arrays.
Now i want to prepare my data sets for kegg
pathway and i followed General applicable gene sets/Pathway
analysis, but on page 18 it says i need to supply expression
changes and target pathway. Target pathway is not a problem
but i have difficulties in preparing my data for pathview.
First, i was wondering how it will calculate the
significantly expressed gene/s with only raw expression
values for pathway.Secondly, based on the above
information how it will plot the kegg pathway.
Technical issue:- i have 23 arrays and so i tried
to minus my ref 1 from sample 1, 1.1 and ref 2 from sample
2, 2,1 respectively. but i get an error
given code: gse16873.d <- gse16783[, dcis] -
gse16873 [,hn] starting data sets for kegg pathway
analysis as i understood so far.
mycode:signals.egid.dataforkegg
<- signals.egid[,c(ED8,M8,ED12,M12)] -
signals.egid[,c(con8,con8,con12,con12)]Error in
Ops.data.frame(signals.egid[, c(ED8, M8, ED12, M12)],
signals.egid[, :
- only defined for equally-sized data
frames
OR
> signals.egid.dataforkegg <-
gagePrep(signals.egid, ref= c(con8,con8,con12, con12), samp=
c(ED8, M8, ED12, M12))
Error in gagePrep(signals.egid, ref = c(con8, con8,
con12, con12), samp = c(ED8, : please make
sure 'ref' and 'samp' are comparable and of
equal length or compare='unpaired'
OR
signals.egid.dataforkegg <-
gagePrep(signals.egid, compare=
'unpaired')Error in
gagePrep(signals.egid, compare = "unpaired")
:
improper 'compare' argument
value
as i understood so far the error could be because
of only 3 arrays in sample 1.1 and rest are 4 each. can you
please advice how should i prepare my data for pathway
analysis with pathview.
3) there are more functions in gage package such
as gagePrep, gagePipe ; i would like to know how relevant is
the output from this functions for pathview. as i look at
the pathview manual quick start with demo data the starting
data is gse16873.d which is an output of code given on page
21 of pathview manual. does results from gagePrep it gives
the same results as subtracting ref from sample?
gse16873.d <- gagePrep(gse16873, gsets= kegg.gs, ref= hn, samp=
dcis)
head(gse16873.d[1:2,])
DCIS_1 DCIS_2 DCIS_3
DCIS_4 DCIS_5 DCIS_6
10000 -0.3076448 -0.1472277 -0.02378481 -0.07056193
-0.001323087 -0.150268110001 0.4158680
-0.3347726 -0.51313691 -0.16653712 0.111122223
0.1340073
More information about the Bioconductor
mailing list