[BioC] DESeq2 with GAGE

Luo Weijun luo_weijun at yahoo.com
Wed Nov 13 20:11:04 CET 2013


Hi Aric,
You mapped your reads to Ensembl genes instead of Entrez Gene. Therefore, you gene ID looks like ENSG00000204248. The gene set data provided or generated by gage package use Entrez Gene IDs (or KEGG gene IDs for minor speices) by default. Therefore, none of your genes mapped to the pathways, hence you got NA’s.
You may mapped that to Entrez Gene as I did in the step 1, i.e. using TxDb.Hsapiens.UCSC.hg19.knownGene as your gene models. Or you may convert your Ensembl gene ID to Entrez Gene ID. pathview package has two functions, id2eg and mol.sum, help you with the ID conversion and redundant ID merging. You may check the help info by:
?id2eg

HTHs.
Weijun

--------------------------------------------
On Tue, 11/12/13, Aric wrote:

 Subject: DESeq2 with GAGE

 Date: Tuesday, November 12, 2013, 8:42 PM

 Weijun,

 I am attempting to use GAGE for kegg analysis with RNA-seq
 data that has
 been analyzed with DESeq2. I am following the workflow
 example from
 October 7, 2013, Section 6.1, and am having some problems.
 The problem
 is that there are no results with GAGE and I believe it is a
 problem
 with the input file. 

 The input to `gage` is just a list of genes with their
 associated
 log2FoldChange. 

 # create DESeq2 results:
 ...
 dds <- DESeq(dds)
 # extract results alone
 deseq2.res <- results(dds)
 # extract just log2FoldChange
 deseq2.fc <- deseq2.res$log2FoldChange
 names(deseq2.fc) <- rownames(deseq2.res)
 exp.fc = res.fc

 Example result of exp.fc:
 > head(exp.fc)
 ENSG00000204248 ENSG00000256195 ENSG00000108797
 ENSG00000225964 
        5.745541     
   5.589081        4.727203 
       5.253303

 I am then to run gage as such, 

 > fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref =
 NULL, samp = NULL)

 and this is where it is confusing. I have already given
 `gage` the results
 of the analysis, just a list of genes with their associated
 log2FoldChanges. It seems that `gage` is looking for
 expression of genes
 across samples, but clearly the above workflow does not
 provide
 that. Here is an example of the output from the above `gage`
 command:

 > fc.kegg.p
 $greater
                
                
                
                
     p.geomean stat.mean p.val q.val set.size exp1
 hsa00010 Glycolysis / Gluconeogenesis     
                
                
 NA       NaN    NA 
   NA        0   NA
 hsa00020 Citrate cycle (TCA cycle)     
                
                
    NA   
    NaN    NA    NA 
       0   NA
 ...
 $less
                
                
                
                
      p.geomean stat.mean p.val q.val
 set.size exp1
 hsa00010 Glycolysis / Gluconeogenesis     
                
                
 NA       NaN    NA 
   NA        0   NA
 ...
 $stats
                
                
            stat.mean
 exp1
 hsa00010 Glycolysis / Gluconeogenesis     
     NaN   NA
 hsa00020 Citrate cycle (TCA cycle)     
        NaN   NA
 ...

 I don't really understand how it can generate any p-values
 from the
 input data. Clearly I am missing something, but I don't see
 any `gage`
 input that does not want gene counts as exprs and a list of
 controls
 versus conditions. Any help would be greatly appreciated. 

 Thanks, Aric



More information about the Bioconductor mailing list