[BioC] How to get Gene ontology (GO) terms per probe

James W. MacDonald jmacdon at uw.edu
Thu Oct 25 18:42:48 CEST 2012


Hi Rafi,

On 10/23/2012 6:59 PM, Rafi [guest] wrote:
> I am new to R/BioC. I am trying to do GO-based clustering of genes. The input (for the package csbl.go) needs to be gene name and GO terms in each row. Example:

Hmm. Weird that this package doesn't have facilities to do this. Anyway, 
not that difficult, starting after your line that creates the testid object:

d.f <- select(rat2302.db, testid, c("SYMBOL", "GO"))
out <- data.frame(tapply(d.f$GO, d.f$SYMBOL, paste, collapse = " ")) ## 
note there is a space between the " ".
write.table(out, "input_for_csbl.txt", col.names = FALSE, quote = FALSE)

Best,

Jim
>
> AP4B1 GO:0005215 GO:0005488 GO:0005515 GO:0005625
> BCAS2 GO:0005515 GO:0005634 GO:0005681 GO:0008380
>
> I tried using annotate in bioconductor:
>
> library("rat2302.db")
> library(annotate)
> testid<-c("1367462_at","1380262_at", "1392516_a_at", "1396521_at")
> goid1<- rat2302GO[testid]
>
> But I get only each GO term in seperate row:
>
> toTable(goid1)
>
> probe_id      go_id Evidence Ontology
> 1  1367462_at GO:0008152      IEA       BP
> 2  1367462_at GO:0008152      ISO       BP
> 3  1367462_at GO:0006508      IMP       BP
> 4  1367462_at GO:0005886      IEA       CC
> 5  1367462_at GO:0005737      IEA       CC
> 6  1380262_at GO:0005575       ND       CC
> 7  1380262_at GO:0005634      IEA       CC
> 8  1380262_at GO:0005737      IEA       CC
> 9  1367462_at GO:0005509      IEA       MF
> 10 1367462_at GO:0005509      TAS       MF
>
> Is there any easier way to get all GO terms per gene/probe?
>
> Any help is greatly appreciated.
>
> Thanks
> Rafi
>
>   -- output of sessionInfo():
>
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] csbl.go_1.4.0        RUnit_0.4.26         cluster_1.14.2       GO.db_2.7.1          BiocInstaller_1.4.9
>   [6] annotate_1.34.1      rat2302.db_2.7.1     org.Rn.eg.db_2.7.1   RSQLite_0.11.1       DBI_0.2-5
> [11] AnnotationDbi_1.18.1 Biobase_2.16.0       BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] IRanges_1.14.4 stats4_2.15.0  tools_2.15.0   XML_3.9-4.1    xtable_1.7-0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list