[BioC] Which genes are in the GO count column?

Tue Aug 21 17:48:33 CEST 2007

Hi Ingrid,

Ingrid H. G. Østensen wrote:
> Hi
> 
> I tried:
> 
> sigLL <- unlist(mget(probe, illuminaHumanv2ENTREZID))
> sigLL <- sigLL[!duplicated(sigLL)]
> 
> insted of:
> 
> sigLL <- unique(unlist(mget(probe, 
> env=illuminaHumanv2ENTREZID,ifnotfound=NA)))
> 
> and then:
> 
> probeSetSummary(hgOver, sigLL)
> 
> Now I get (all?) the GO, not only the ones found in summary(hgOver), and 
> I still get the error message regrading the named vector.

Did you read the help page for probeSetSummary()? If so, you should have 
seen this:

Value:

      A 'list' of 'data.frame'.  Each element of the list corresponds to
      one of the GO terms (the term is provides as the name of the
      element).  Each 'data.frame' has three columns: the Entrez Gene ID
      ('EntrezID'), the probe set ID ('ProbeSetID'), and a 0/1 indicator
      of whether the probe set ID was provided as part of the initial
      input ('selected')

Which should have helped explain the output you get. Anyway, I don't see 
the same problems you do. Are you using a current version of R/BioC (the 
posting guide _does_ ask you to give the output of sessionInfo()).

 > sel <- unlist(mget(ls(illuminaHumanv2ENTREZID)[sample(1:48000, 
300)],illuminaHumanv2ENTREZID, ifnotfound=NA))
 > sel <- sel[!is.na(sel)]
 > sel <- sel[!duplicated(sel)]
 > univ <- unique(getLL(ls(illuminaHumanv2ENTREZID), "illuminaHumanv2"))
 > p <- new("GOHyperGParams", geneIds=sel, universeGeneIds=univ, 
ontology="BP", annotation="illuminaHumanv2", conditional=T)
 > hyp <- hyperGTest(p)
 > ps <- probeSetSummary(hyp, categorySize=10)
 > ps[[1]]
            EntrezID ProbeSetID selected
ILMN_2110       841  ILMN_2110        0
ILMN_29186      841 ILMN_29186        1
ILMN_29639      841 ILMN_29639        0
ILMN_5213    133396  ILMN_5213        1

 > sessionInfo()
R version 2.5.1 (2007-06-27)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] "splines"   "tools"     "stats"
[4] "graphics"  "grDevices" "datasets"
[7] "utils"     "methods"   "base"

other attached packages:
illuminaHumanv2     hgu133plus2         GOstats
         "1.2.0"        "1.16.0"         "2.2.6"
        Category          Matrix         lattice
         "2.2.3"    "0.999375-0"       "0.15-11"
      genefilter        survival            KEGG
        "1.14.1"          "2.32"        "1.16.0"
            RBGL        annotate         Biobase
        "1.12.0"        "1.14.1"        "1.14.0"
              GO           graph        rcompgen
        "1.16.0"        "1.14.2"        "0.1-13"

Best,

Jim

> 
> And writing probeSetSummary object to file with write or write.table 
> does not work.
> 
> Maybe I have to look into the affycoretool package to morrow. :-)
> 
> Regards,
> Ingrid
> Hi Ingrid,
> 
> Ingrid H. G. Østensen wrote:
>  > Hi
>  >
>  > Thanks for tips but I still a bit lost. This is what I have done:
>  >
>  >
>  >  # Lots of QC and limma things.
>  > 
> :-)                                                                            
>  > 
>  >   probe <- top2[,1] #top2 is from topTable
>  >
>  >   sigLL <- unique(unlist(mget(probe, env=illuminaHumanv2ENTREZID,
>  > ifnotfound=NA)))
> 
> It appears that unique() strips off the names, so you should probably
> substitute something like this:
> 
> sigLL <- unlist(mget(probe, illuminaHumanv2ENTREZID))
> sigLL <- sigLL[!duplicated(sigLL)]
> 
>  >   sigLL <- as.character(sigLL[!is.na(sigLL)])
>  >     
>  >   params <- new("GOHyperGParams", geneIds= sigLL,
>  > annotation="illuminaHumanv2", ontology="CC", pvalueCutoff= 0.05, 
>  >   conditional=FALSE, testDirection="under")
>  >   hgOver <- hyperGTest(params)
>  >   res_filNavn <- paste(1, "_GO_summary_CC_under.html", sep = "")
>  >   htmlReport(hgOver,file=res_filNavn)
>  >
>  >
>  >   summary(hgOver)
>  >                GOCCID      Pvalue OddsRatio   ExpCount Count 
>  > Size                         Term
>  >   GO:0044422 GO:0044422 0.002652398 0.6469686  65.993365    46 
>  > 2345               organelle part
>  >   GO:0044446 GO:0044446 0.002652398 0.6469686  65.993365    46  2345
>  > intracellular organelle part
>  >   GO:0005634 GO:0005634 0.002722435 0.7098244 112.259076    88 
>  > 3989                      nucleus
>  >   GO:0030529 GO:0030529 0.002771273 0.2913123  13.114246     4   466   
>  > ribonucleoprotein complex
>  >   GO:0044428 GO:0044428 0.008533045 0.5194381  23.920836    13  
>  > 850                 nuclear part
>  >   GO:0005840 GO:0005840 0.028727650 0.2791575   6.922971     2  
>  > 246                     ribosome
>  >   GO:0005623 GO:0005623 0.037820314 0.7152750 340.717129   331
>  > 12107                         cell
>  >   GO:0044464 GO:0044464 0.038306007 0.7160755 340.688987   331
>  > 12106                    cell part
>  >   GO:0031981 GO:0031981 0.046628778 0.5403759  14.352502     8  
>  > 510                nuclear lumen
>  >
>  >
>  >    # Find the ID in the count colunm
>  >    probeSetSummary(hgOver)
>  >  
>  >    # This gives me all the genes (some entrez id are dublicatet because
>  > of their linkage to different probes) but I get a
>  >    warning message:
>  >  
>  >    Warning message:
>  >    The vector of geneIds used to create the GOHyperGParams object was
>  > not a named vector.
>  >    If you want to know the probesets that contributed to this result
>  >    you need to pass a named vector for geneIds.
>  > 
>  >
>  >
>  >
>  > I have tried to make a named vector but apparently I do not understand
>  > what it is, how can I make it work?
>  > And how can I get the probeSetSummary into a file? Any suggestions?
> 
> Sure. As I mentioned in my first email, you can use hyperG2annaffy() in
> affycoretools. Alternatively you can always use write.table().
> 
> Best,
> 
> Jim
> 
> 
>  > 
>  > Regards,
>  > Ingrid
>  >
>  >
>  > "James W. MacDonald" <jmacdon at med.umich.edu> writes:
>  >
>  >  > Hi Ingrid,
>  >  >
>  >  > Ingrid H. G. Østensen wrote:
>  >  >> Hi
>  >  >>
>  >  >> I am testing for GO in my dataset and I am able to make html pages
>  >  >> that contains different type of information. But I was wondering if
>  >  >> there is some way to find out which genes are in the Count column? It
>  >  >> might say 2, but not which 2 genes.
>  >  >
>  >  > See probeSetSummary() in GOstats and hyperG2annaffy() in 
> affycoretools.
>  >  > Note that for probeSetSummary() to work correctly you have to pass 
> in a
>  >  > *named* vector of Entrez Gene IDs, which you can get by using 
> unlist():
>  >  >
>  >  > my.named.probeids <- unlist(mget(probeID.vector,
>  >  > "chip.annotation.package.name"))
>  >
>  > So assuming the OP is using GOstats, R-2.5.x, and the latest available
>  > version installed using biocLite...
>  >
>  >   Please try
>  >
>  >        help("HyperGResult-accessors")
>  >
>  > + seth
>  >
>  > --
>  > Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research 
> Center
>  > BioC: http://bioconductor.org/
>  > Blog: http://userprimary.net/user/
>  >
> 
> --
> James W. MacDonald, M.S.
> Biostatistician
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623