[BioC] How map probeset_id to gene_symbols or other annotation information?

Peng Yu pengyu.ut at gmail.com
Mon Aug 10 20:03:20 CEST 2009


On Mon, Aug 10, 2009 at 11:52 AM, Marc Carlson<mcarlson at fhcrc.org> wrote:
> Hi Peng,
>
> There is in fact a lot of documentation inside of each package if you
> know how to look for it.  One form is in the form of manual pages which
> can be listed like this example:
>
> ls("package:mogene10stprobeset.db")
>
> And then you can read the manual pages by typing ? followed by the name
> of the object you want to know about like this example:
>
> ?mogene10stprobesetENTREZID
>
> Finally, almost every bioconductor package has some sort vignette that
> is associated with it.  In the case of the annotation packages, there
> are three vignettes loaded with AnnotationDbi (which will always be
> loaded before any annotation package, so they will always be there if
> you look).  You can load a vignette by using the openVignette() command
> like this:
>
> openVignette()
>
> And then just pick the number for the vignette that you would like to
> read.  Reading the vignette will give a much more comprehensive overview
> of the purpose of the package with even more examples than the manual
> pages.  Both of these resources are critical if you want to be able to
> use R.  I would recommend that you look at these in addition to reading
> that R user manual that was mentioned before.
>
> With respect to the annotation packages, they are not simply a repeat of
> what is in the csv files from Affymetrix.  In fact, we don't actually
> even know where Affymetrix gets the data in those files from, nor do we
> use most of that data in those files in building the annotation
> packages.  Instead we go direct to the source whenever possible and get
> most of our information from places like NCBI, the EBI etc.  The only
> information that we get from Affymetrix is the basic probe to gene
> mapping data (in the form of probe to entrez gene, genbank accession
> etc.) which we then map onto the information from primary sources such
> as NCBI etc. in order to tie the other data to the probes.  You are free
> of course to use whichever information source you prefer, but please be
> advised that they are probably not equivalent.

Hi Marc,

I run the following example shown in ?mogene10stprobesetENTREZID. It
doesn't provide very meaningful error message (at the end of this
message). Do you what the problem might be?

I also run the following code. But I don't quite understand what the
word 'vignette' means. Especially, what does it mean in R? Is
'vignette' a package documentation? Another problem is how to wisely
choose the most relevant vignette if it shows 10 vignette?

> library(mogene10stprobeset.db)
> openVignette()
Please select a vignette:

 1: AnnotationDbi - AnnotationDbi
 2: AnnotationDbi - Creating probe packages
 3: AnnotationDbi - SQLForge
 4: Biobase - An introduction to Biobase and ExpressionSets
 5: Biobase - Bioconductor Overview
 6: Biobase - esApply Introduction
 7: Biobase - Notes for eSet developers
 8: Biobase - Notes for writing introductory 'how to' documents
 9: Biobase - quick views of eSet instances
10: DBI - A Common Database Interface (DBI)

Based on your last advice, most of the time, it is better to use the
annotation package rather than the affymetrix csv files, right?

Regards,
Peng

$ Rscript run.R
> library(mogene10stprobeset.db)
Loading required package: methods
Loading required package: AnnotationDbi
Loading required package: Biobase

Welcome to Bioconductor

  Vignettes contain introductory material. To view, type
  'openVignette()'. To cite Bioconductor, see
  'citation("Biobase")' and for packages 'citation(pkgname)'.

Loading required package: DBI
> x <- mogene10stprobesetENTREZID
> # Get the probe identifiers that are mapped to an ENTREZ Gene ID
> mapped_probes <- mappedkeys(x)
> # Convert to a list
> xx <- as.list(x[mapped_probes])
Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: String or BLOB exceeded size limit)
Calls: as.list ... dbGetQuery -> sqliteQuickSQL -> sqliteExecStatement -> .Call
Execution halted



More information about the Bioconductor mailing list