[BioC] Building the tomato annotation library(Affy)

Tue Dec 11 17:11:34 CET 2012

Hi Jorge,

On 12/10/2012 12:00 PM, Jorge Mena-Ali wrote:
> I'm trying to obtain the annotation file for the Affy tomato chip.  Any
> suggestions on specific code to append this file to the eset will be
> appreciated.

There are two general ways to handle this situation (that I know of).

1.) Just use the Affy annotation file directly.
2.) Build an org package and then use say UniGene or Gene IDs from the 
annotation file to map things.

For #1, you can download the csv file from Affy and do something like

 > dat <- read.csv("Tomato.na33.annot.csv", header = TRUE, skip = 13, 
na.string = "---")
 > names(dat)
  [1] "Probe.Set.ID"                     "GeneChip.Array"
  [3] "Species.Scientific.Name"          "Annotation.Date"
  [5] "Sequence.Type"                    "Sequence.Source"
  [7] "Transcript.ID.Array.Design."      "Target.Description"
  [9] "Representative.Public.ID"         "Archival.UniGene.Cluster"
[11] "UniGene.ID"                       "Genome.Version"
[13] "Alignments"                       "Gene.Title"
[15] "Gene.Symbol"                      "Chromosomal.Location"
[17] "Unigene.Cluster.Type"             "Ensembl"
[19] "Entrez.Gene"                      "SwissProt"
[21] "EC"                               "OMIM"
[23] "RefSeq.Protein.ID"                "RefSeq.Transcript.ID"
[25] "FlyBase"                          "AGI"
[27] "WormBase"                         "MGI.Name"
[29] "RGD.Name"                         "SGD.accession.number"
[31] "Gene.Ontology.Biological.Process" "Gene.Ontology.Cellular.Component"
[33] "Gene.Ontology.Molecular.Function" "Pathway"
[35] "InterPro"                         "Trans.Membrane"
[37] "QTL"                              "Annotation.Description"
[39] "Annotation.Transcript.Cluster"    "Transcript.Assignments"
[41] "Annotation.Notes"

and then you can use the existing functions in R to merge() (<- and that 
is a hint right there) the set of significant (or not) probesets with 
various annotations.

However, the Affy annotations are static as to the build date, and may 
be pretty stale by the time you get to them. You can always go to NCBI 
and build your own organism-level package, and use that to do the 
annotations.

 > library(AnnotationForge)
 > makeOrgPackageFromNCBI(version = "0.0.1", author = "me", maintainer = 
"me <me at mine.org>", outputDir = ".", tax_id = 4081, genus = "Solanum", 
species = "lycopersicum")
Loading required package: GO.db

Getting data for gene2pubmed.gz
Loading required package: RCurl
Loading required package: bitops
Populating gene2pubmed table:
table gene2pubmed filled
Getting data for gene2accession.gz
<other blahblahblah snipped>
Creating package in ./org.Slycopersicum.eg.db
[1] TRUE

So after waiting a while, I get this message telling me a package has 
been made. And now I need to install.

 > install.packages("org.Slycopersicum.eg.db", repos = NULL, type = 
"source")
* installing *source* package org.Slycopersicum.eg.db ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded

* DONE (org.Slycopersicum.eg.db)

Now you can use this package to annotate things:

 > x <- as.character(sample(dat$UniGene.ID[!is.na(dat$UniGene.ID)], 25))
 > select(org.Slycopersicum.eg.db, x, c("SYMBOL","GENENAME"), "UNIGENE")
      UNIGENE    SYMBOL                  GENENAME
1  Les.20210 <NA> <NA>
2  Les.11435 <NA> <NA>
3  Les.12414 <NA> <NA>
4  Les.17835      SNF1              SNF1 protein
5   Les.1796 <NA> <NA>
6        --- <NA> <NA>
7   Les.1268      MKP1    MAP kinase phosphatase
8   Les.7575 <NA> <NA>
9   Les.7326 <NA> <NA>
<snip>

Best,

Jim

>
>
>
> Jorge
>
>
>
>
>
> ****************************
>
> Jorge Mena-Ali, PhD
>
> Visiting Assistant Professor
>
> Dept of Biology, Franklin&  Marshall College
>
> Lancaster PA 17604
>
> ****************************
>
>
>
>
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099