[BioC] An. Gambie RNA-Seq differential expression using gage and pathview

Luo Weijun luo_weijun at yahoo.com
Thu Jul 3 16:14:17 CEST 2014


You had some problem with your kegg gene set data. What you did:
kegg.gs <- kegg.gsets(species = "aga", id.type = "kegg")
cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp =
samp.idx, compare ="paired")

kegg.gsets produces kegg gene sets with some meta-data. You need to process the right gene set data out before calling gage:
kg.aga=kegg.gsets("aga")
kegg.gs=kg. aga$kg.sets[kg. aga$sigmet.idx]
cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp =
samp.idx, compare ="paired")
…

You can check the details on your gene set data:
names(kg.aga)
lapply(kg.aga, head, 3)
lapply(kegg.gs[1:3], head, 3)

you may also check the function documentation:
?kegg.gsets
Actually you will find everything in the pacakge tutorials and documentations for functions you work with like gage or pathview etc:
http://bioconductor.org/packages/release/bioc/html/gage.html
http://bioconductor.org/packages/release/bioc/html/pathview.html

 ----------------------------------------------
 Saul Lozano wrote:

 ​Dear bioconductor members,

   I have completed the "RNA-Seq Data Pathway and Gene-set
 Analysis
 Workflows" tutorial and have obtained the output png files
 ​ for the human example datasets​.

  Having completed
 ​the tutorial, I started working with my mosquito
 data.
 ​I aligned my reads using gmap-gsnap using "
 https://www.vectorbase.org/download/anopheles-gambiae-pestchromosomesagamp4fagz"
 and created the Transcription Database object using its
 corresponding
 annotation files that can be find at "
 https://www.vectorbase.org/download/anopheles-gambiae-pestbasefeaturesagamp41gff3gz
 "
 ​ ​
   I have completed the pre-processing and normalization,
 the list of
 normalized reads per gene looks like this

 ​>cnts.norm​

 ...
 AGAP009385        1.729138e+01        2.222712e+03 
       8.552872e+02
 AGAP009386        3.094247e+01        1.681387e+02 
       7.808745e+01
 AGAP009387        1.274102e+02        6.561511e+01 
       7.441275e+01
 AGAP009388        1.695465e+03        3.033674e+03 
       1.940243e+03
 AGAP009389        1.071155e+03        1.202602e+03 
       9.646097e+02
 AGAP009390        0.000000e+00        2.050472e+00 
       1.837352e+00
 AGAP009391        0.000000e+00        0.000000e+00 
       0.000000e+00
 AGAP009392        0.000000e+00        0.000000e+00 
       0.000000e+00
 AGAP009393        1.820145e+00        0.000000e+00 
       0.000000e+00
 AGAP009394        0.000000e+00        2.050472e+00 
       0.000000e+00
 ...

 ​the structure of the object is:

 > str(cnts.norm)
  num [1:12489, 1:12] 1857 3076 1390 4953 5027 ...
  - attr(*, "dimnames")=List of 2
   ..$ : chr [1:12489] "AGAP000002" "AGAP000005"
 "AGAP000007" "AGAP000008"
 ...
   ..$ : chr [1:12] "library10.sorted.bam"
 "library11.sorted.bam"
 "library12.sorted.bam" "library1.sorted.bam" ...

 I load the kegg pathway database using kegg.gsets like this

 >kegg.gs <- kegg.gsets(species = "aga", id.type =
 "kegg")

 but the accession number are different in the Transcription
 database and
 the kegg.

 > kegg.gs
 ...
 $kg.sets$`aga04745 Phototransduction - fly`
  [1] "AgaP_AGAP000028" "AgaP_AGAP000348" "AgaP_AGAP000651"
 "AgaP_AGAP000945"
  [5] "AgaP_AGAP001506" "AgaP_AGAP001936" "AgaP_AGAP002145"
 "AgaP_AGAP005079"
  [9] "AgaP_AGAP005095" "AgaP_AGAP006263" "AgaP_AGAP006475"
 "AgaP_AGAP008435"
 [13] "AgaP_AGAP009115" "AgaP_AGAP009730" "AgaP_AGAP009953"
 "AgaP_AGAP010630"
 [17] "AgaP_AGAP010957" "AgaP_AGAP011099" "AgaP_AGAP011414"
 "AgaP_AGAP011516"
 [21] "AgaP_AGAP012026" "AgaP_AGAP012252"
 ...
 The gage tutorial mentions that the names of the genes in
 the counts and
 the kegg object should be the same, so I changed the
 rownames of the counts
 by adding "Agap_"

 > x <- data.frame(rownames(cnts.norm),
 stringsAsFactors=FALSE)
 > x$new <- paste("AgaP_", x[,1],sep="")
 > row.names(cnts.norm) <- x$new

 so the list now looks like this

 ...
 AgaP_AGAP009385        1.729138e+01       
 2.222712e+03        8.552872e+02
 AgaP_AGAP009386        3.094247e+01       
 1.681387e+02        7.808745e+01
 AgaP_AGAP009387        1.274102e+02       
 6.561511e+01        7.441275e+01
 AgaP_AGAP009388        1.695465e+03       
 3.033674e+03        1.940243e+03
 AgaP_AGAP009389        1.071155e+03       
 1.202602e+03        9.646097e+02
 AgaP_AGAP009390        0.000000e+00       
 2.050472e+00        1.837352e+00
 AgaP_AGAP009391        0.000000e+00       
 0.000000e+00        0.000000e+00
 AgaP_AGAP009392        0.000000e+00       
 0.000000e+00        0.000000e+00
 AgaP_AGAP009393        1.820145e+00       
 0.000000e+00        0.000000e+00
 AgaP_AGAP009394        0.000000e+00       
 2.050472e+00        0.000000e+00
 ...

 In theory everything is set for the gage analysis like this

 > cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref
 = ref.idx, samp =
 samp.idx, compare ="paired")

 but I get

 > cnts.kegg.p
 $greater
            p.geomean stat.mean p.val q.val set.size
 library2.sorted.bam
 kg.sets           NA       NaN    NA    NA 
       2                  NA
 sigmet.idx        NA       NaN    NA    NA   
     0                  NA
 sig.idx           NA       NaN    NA    NA 
       0                  NA
 met.idx           NA       NaN    NA    NA 
       0                  NA
 dise.idx          NA       NaN    NA    NA 
       0                  NA
            library5.sorted.bam
 kg.sets                     NA
 sigmet.idx                  NA
 sig.idx                     NA
 met.idx                     NA
 dise.idx                    NA

 $less
            p.geomean stat.mean p.val q.val set.size
 library2.sorted.bam
 kg.sets           NA       NaN    NA    NA 
       2                  NA
 sigmet.idx        NA       NaN    NA    NA   
     0                  NA
 sig.idx           NA       NaN    NA    NA 
       0                  NA
 met.idx           NA       NaN    NA    NA 
       0                  NA
 dise.idx          NA       NaN    NA    NA 
       0                  NA
            library5.sorted.bam
 kg.sets                     NA
 sigmet.idx                  NA
 sig.idx                     NA
 met.idx                     NA
 dise.idx                    NA

 $stats
            stat.mean library2.sorted.bam
 library5.sorted.bam
 kg.sets          NaN                  NA   
               NA
 sigmet.idx       NaN                  NA   
               NA
 sig.idx          NaN                  NA   
               NA
 met.idx          NaN                  NA   
               NA
 dise.idx         NaN                  NA   
               NA

 My questions are, how do I debug this problem? is the gene
 name (accession
 number) change correct? Are "NaN"s the result of zeros in
 the datasets? any
 help will be greatly appreciated.

 Best regards, Saul

     [[alternative HTML version deleted]]



More information about the Bioconductor mailing list