[BioC] GSEABase GOCollection wrong ontology

Martin Morgan mtmorgan at fhcrc.org
Sat Feb 28 01:57:44 CET 2009


Hans-Ulrich Klein <h.klein at uni-muenster.de> writes:

> Dear All,
>
> I want to generate sets of probes from the affy hgu133plus2 chip from
> the GO "Molecular Function". I used the GSEABase package:
>
>  > library("GSEABase")
>  > sets =
> GeneSetCollection(idType=AnnotationIdentifier("hgu133plus2.db"),
> setType=GOCollection(ontology="MF"))
>  > sets[[1]]
> setName: GO:0000002
> geneIds: 1557631_at, 202825_at, ..., 203466_at (total: 5)
> geneIdType: Annotation (hgu133plus2.db)
> collectionType: GO
>   ids:  (0 total)
>   evidenceCode: IMP IPI TAS ISS IDA NAS IEA IGI RCA IEP IC NR ND
>   ontology: MF
> details: use 'details(object)'
>
> The Gene Ontology web site says that the id "GO:0000002" belongs to
> "biological process". Something went wrong...

Hi Hans-Ulrich --

Note that there are 0 id's in collectionType(sets[[1]]).

The confusion (some on my part, no doubt) comes from the role that
GOCollection plays -- it is meant as a way of representing perhaps
several GO ids. The 'ontology' specification filters the ids and
creates a collection consisting only of those ids satisfying the
ontology. Thus

> GOCollection("GO:000002", ontology="MF")
collectionType: GO 
  ids:  (0 total)
  evidenceCode: IMP IPI TAS ISS IDA NAS IEA IGI RCA IEP IC NR ND 
  ontology: MF 
> GOCollection(c("GO:0000002", "GO:0000009"), ontology="MF")
collectionType: GO 
  ids: GO:0000009 (1 total)
  evidenceCode: IMP IPI TAS ISS IDA NAS IEA IGI RCA IEP IC NR ND 
  ontology: MF 
> GOCollection(c("GO:0000002", "GO:0000009"), ontology="BP")
collectionType: GO 
  ids: GO:0000002 (1 total)
  evidenceCode: IMP IPI TAS ISS IDA NAS IEA IGI RCA IEP IC NR ND 
  ontology: BP 

So sets[[1]] contains an empty GOCollection -- the probes form a
coherent set based on their GO ids, but there are no GO ids that
simultaneously circumscribe the set and satisfy the ontology
constraint. Unfortunately, the name of the set is the name of the GO
identifier.

Probably a more reasonable result is like

> sets[ids(GOCollection(names(sets), ontology="MF"))]
GeneSetCollection
  names: GO:0000009, GO:0000010, ..., GO:0060230 (2495 total)
  unique identifiers: 218444_at, 220865_s_at, ..., 229922_at (31278 total)
  types in collection:
    geneIdType: AnnotationIdentifier (1 total)
    collectionType: GOCollection (1 total)

At this point it is worth pointing out that evidenceCode behaves
differently, removing probes that do not satisfy the evidenceCode
constraint. This reflects the fact that the evidence code is not a
property of the GO term, but of the geneId.

Martin

> Regards,
> Hans-Ulrich
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioconductor mailing list