[BioC] clustering genes in GO categories

James MacDonald jmacdon at med.umich.edu
Thu Jan 6 22:09:50 CET 2011


Hi Assa,

I don't think you need a package for that. A call to tapply() followed by a call to do.call() should get you where you want to go.

Say you read your table into R, and call it 'dat'.

thelist <- tapply(1:nrow(dat), dat$GOMF, function(x) dat[x, 3])

then you will have a list, with the names being the GOMF and the list items being all the gene ids. Collapsing that to a matrix is difficult because you will have different numbers of columns. So you can either collapse all the list items using commas, or directly write out to a file. Collapsing with commas is easy:

commalist <- lapply(thelist, paste, collapse = ",")
avector <- do.call("c", commalist)
names(vector) <- names(commalist)

or you could just write out to a file using something like

con <- file("mydata.txt", "w")

for(i in seq(along = commalist)) cat(names(commalist)[i], commalist[[i]], "\n", sep = "\t", file = con)

close(con)

All untested, so  you might have to fiddle a bit to get the results you want.

Best,

Jim

James W. MacDonald, M.S.
Biostatistician
Douglas Lab
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
>>> Assa Yeroslaviz  01/06/11 1:02 PM >>>
Hi, everybody,

I was wondering whether there is a package to cluster a list of genes to
different GO categories

my problem is as such:
i have a list of genes (a tab delimited file):
id    flybasename_gene    flybase_gene_id    entrezgene    GOMF

1616608_a_at    Gpdh    FBgn0001128    33824    carboxylesterase activity
hydrolase activity    3',5'-cyclic-nucleotide phosphodiesterase activity
protein binding
1622892_s_at    CG33057    FBgn0053057    318833    nucleotide binding
protein binding    ATP binding    chaperone binding    ammonium
transmembrane transporter activity
1622892_s_at    mkg-p    FBgn0035889    38955    nucleotide binding
protein binding    ATP binding    chaperone binding    ammonium
transmembrane transporter activity
1622893_at    IM3    FBgn0040736    50209    aminopeptidase activity
metalloexopeptidase activity    hydrolase activity    manganese ion bindin
1622894_at    CG15120    FBgn0034454    37248    protein binding

I would like to try and group the genes in various GO categories, which are
mentioned here in the last columns. The GO categories take more than one
column and the number is not equal in each line, deending on the depth of
the annotation for each gene.
Is there a way of transforming the table, so that I in the first column a
list of my GO categories and than on each line a list with gene IDs (the
right ID are not important as I can change them as I wish).
I would like to have something like that:
GO    genes
protein binding     FBgn0001128    FBgn0053057     FBgn0035889 etc.
ammonium transmembrane transporter activity      FBgn0053057    FBgn0035889
hydrolayse activity   FBgn0040736     FBgn0001128


I would appriciate any kind of help or ideas

Thanks
Assa

    [[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list