[BioC] Oligo package annotation

Fri Dec 14 15:58:48 CET 2012

Please don't take conversations off-list. We like to think of the list 
archives as a repository of information.

On 12/14/2012 5:45 AM, Bruno Giotti wrote:
> Ok thanks, but what should i do to query the pd.hugene.1.1.st.v1 
> annotation pack and retrieving some useful IDs? I could use the 
> package you suggested me but i'd like to first understand how to use 
> this one ( pd.hugene.1.1.st.v1).

The pd.hugene.1.1.st.v1 package is NOT an annotation package. Instead, 
it maps the locations of probes on the array to different probesets. 
This package is used by oligo to decide which probes go into which 
probeset, so you can summarize at different levels (e.g., for the HuGene 
arrays, at the 'probeset' level, which is roughly exon-level, or at the 
transcript level).

Unless you have a real need to know where things are on the chip, the 
pd.hugene.1.1.st.v1 package is not of much use. Well, let me take that 
back. I have found that the intronic background controls have a really 
bad habit of popping up in lists of differentially expressed genes. 
There are any number of hypotheses that I can come up with that would 
explain why this is so, but in the end I haven't found any end users who 
really care. So I use the pd.hugene.1.1.st.v1 package to figure out 
which probesets are not controls, and exclude them prior to selecting 
differentially expressed genes. The getMainProbes() function in 
affycoretools is useful in this respect.

So back to the story at hand. Since the pd.hugene.1.1.st.v1 package 
doesn't do annotations, you need to use the 
hugene11sttranscriptcluster.db package. It does use a SQLite database as 
its backend, but unless you like to do SQL queries this is of no relevance.

The canonical reference for using these annotation packages is the Intro 
to Annotation Packages, which can be accessed by

library(hugene11sttranscriptcluster.db)
openVignette()

and then choosing

  AnnotationDbi - AnnotationDbi: Introduction To Bioconductor Annotation 
Packages

if you care about the internals, you can read

AnnotationDbi - How to use bimaps from the ".db" annotation packages

And if you just want to create annotated output, take a look at the 
annaffy package, which automates these things.

Best,

Jim

> Thaniks again
>
> > Date: Thu, 13 Dec 2012 11:53:52 -0500
> > From: jmacdon at uw.edu
> > To: guest at bioconductor.org
> > CC: bioconductor at r-project.org; latini18 at hotmail.com; 
> Benilton.Carvalho at cancer.org.uk
> > Subject: Re: [BioC] Oligo package annotation
> >
> >
> >
> > On 12/13/2012 11:48 AM, Bruno [guest] wrote:
> > > Hi all,
> > > My question is quite straight-forward: how do i retrieve EntrezId 
> or geneSymbol for pd.hugene.1.1.st.v1 to merge into my gene expression 
> matrix? I havent found any vignettes explaining this. I know that the 
> annotation file is a SQLite DB which i have to query. However im 
> failing to find the tables i need. Sorry if i persevere in not 
> explaining myself enough.
> >
> > It depends on what level you used for summarization. Assuming that you
> > used transcript-level summarization (which I would highly recommend),
> > you want to use the hugene11sttranscriptcluster.db package. If you did
> > something like
> >
> > rma(<filename>, target="probeset")
> >
> > then you want the hugene11stprobeset.db
> >
> > Best,
> >
> > Jim
> >
> >
> > >
> > >
> > > -- output of sessionInfo():
> > >
> > > R version 2.15.1 (2012-06-22)
> > > Platform: x86_64-pc-mingw32/x64 (64-bit)
> > >
> > > locale:
> > > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United 
> Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
> > > [4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252
> > >
> > > attached base packages:
> > > [1] stats graphics grDevices utils datasets methods base
> > >
> > > other attached packages:
> > > [1] pd.hugene.1.1.st.v1_3.8.0 oligo_1.22.0 affyPLM_1.34.0 
> preprocessCore_1.20.0 latticeExtra_0.6-24
> > > [6] lattice_0.20-10 RColorBrewer_1.0-5 BiocInstaller_1.8.3 
> simpleaffy_2.34.0 gcrma_2.30.0
> > > [11] genefilter_1.40.0 affy_1.36.0 limma_3.14.3 RSQLite_0.11.2 
> DBI_0.2-5
> > > [16] Biobase_2.18.0 oligoClasses_1.20.0 BiocGenerics_0.4.0
> > >
> > > loaded via a namespace (and not attached):
> > > Error in x[["Version"]] : subscript out of bounds
> > > In addition: Warning message:
> > > In FUN(c("affxparser", "affyio", "annotate", "AnnotationDbi", 
> "Biostrings", :
> > > DESCRIPTION file of package 'survival' is missing or broken
> > >
> > > --
> > > Sent via the guest posting facility at bioconductor.org.
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > --
> > James W. MacDonald, M.S.
> > Biostatistician
> > University of Washington
> > Environmental and Occupational Health Sciences
> > 4225 Roosevelt Way NE, # 100
> > Seattle WA 98105-6099
> >

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099