[BioC] GSVA: using Entrez ID's as identifiers

Thu Nov 17 01:41:16 CET 2011

Hi Tom,

At least one reason why the miRNA packages will have less information is 
that there is simply less annotation in existance for miRNA work.  
People have been studying genes for many decades and miRNAs became 
popular much more recently.  That said, there are a few annotation 
packages for miRNA work that you can find on our website here:

http://www.bioconductor.org/packages/release/data/annotation/

And we are always interested in adding more resources if you have the 
inclination.  For miRNAs the instructions for making a new package that 
you will find in AnnotationDbi, will probably be of somewhat limited 
utility.  But we have been working to make adding new annotation 
resources easier for people, so please contact me if you are interested 
in doing that and I might be able to help you create something new.

OTOH, if you just want to get the miRNAs mapped to target genes, you 
might want to look at Jim Reid's targetscan packages (found at the same 
page listed above).

   Marc

On 11/15/2011 02:24 PM, Tom Keller wrote:
> Greetings,
> The annotation for the miRNA chip does not seem to have the same amount of information as the hgu95 db. Is there some help available for mapping miRNA probes to their target genes?
>
> thanks
> Thomas (Tom) Keller, PhD
> kellert at ohsu.edu
> 503.494.2442
> 6588 R Jones Hall (BSc/CROET)
> MMI DNA Services
> Member of OHSU Shared Resources
>
> On Nov 14, 2011, at 11:28 PM, Robert Castelo wrote:
>
>> hi Wendy,
>>
>> i'm afraid you need to get a little bit acquainted with the way in which
>> annotations are handled in BioC. a good starting point could be looking
>> a the vignette "AnnotationDbi: How to use the .db annotation packages"
>> from the AnnotationDbi package.
>>
>> the short answer to your problem is that hgu95a is not the only platform
>> for which annotations exist in BioC, basically there is an annotation
>> package for each platform supported by BioC (you can look all of them up
>> by going to http://www.bioconductor.org/packages/release/BiocViews.html
>> and clicking on "AnnotationData") but in order to use on such annotation
>> packages you need
>>
>> 1. install it once in your system via source() and biocLite() just as
>> with every software package
>>
>> 2. load it via the library() function.
>>
>> in order to use the human organism-level package i mentioned in my
>> previous email you need to install it first and then load it prior to do
>> anything else with it.
>>
>> let me know if this still does not solve your problem.
>>
>> cheers,
>> robert.
>>
>> On Mon, 2011-11-14 at 18:40 -0500, Wendy Qiao wrote:
>>> Hi Robert,
>>>
>>> Thank you for your reply. I happened to convert all the genes to
>>> hgu95a probe IDs as I found that this is the only platform that works
>>> with ExpressionSet. It would be great that we could make the entrez ID
>>> works. Following is my error that I got with your code.
>>>
>>>
>>> Thank you.
>>> Wendy
>>>
>>>
>>>> BcellSet
>>> ExpressionSet (storageMode: lockedEnvironment)
>>> assayData: 12148 features, 7 samples
>>>   element names: exprs
>>> protocolData: none
>>> phenoData
>>>   sampleNames: Illumi_PREBCEL_1 Illumi_PREBCEL_2 ... Affy_PREBCEL_4 (7
>>> total)
>>>   varLabels: CellType Platform Replicates
>>>   varMetadata: labelDescription
>>> featureData: none
>>> experimentData: use 'experimentData(object)'
>>> Annotation: org.Hs.eg.db
>>> preBcell.KEGG<-gsva(BcellSet,KEGGc2BroadSets,abs.ranking=FALSE)$es.obs
>>> Mapping identifiers between gene sets and feature names
>>> Error in GeneSetCollection(lapply(what, mapIdentifiers, to, ...,
>>> verbose = verbose)) :
>>>   error in evaluating the argument 'object' in selecting a method for
>>> function 'GeneSetCollection': Error in get(mapName, envir = pkgEnv,
>>> inherits = FALSE) :
>>>   object 'org.Hs.egENTREZID' not found
>>>
>>>
>>>
>>>
>>> On 14 November 2011 12:27, Robert Castelo<robert.castelo at upf.edu>
>>> wrote:
>>>         hi Wendy,
>>>
>>>         sorry for my late answer. in principle there is no problem for
>>>         the
>>>         gsva() function to take Entrez IDs in your expression data
>>>         matrix.
>>>
>>>         if the expression data comes as a matrix, and rows are
>>>         annotated with
>>>         Entrez IDs and the gene sets are also annotated with Entrez
>>>         IDs, there
>>>         should be absolutely no problem.
>>>
>>>         if the expression data comes as an ExpressionSet object where
>>>         the
>>>         'features' are not Affy probe IDs but just EntrezIDs. just
>>>         make sure
>>>         that the annotation slot has the corresponding organism-level
>>>         package.
>>>         for instance, in the case of human:
>>>
>>>         annotation(eset)<- "org.Hs.eg.db"
>>>
>>>         let me know if you have any problem with this.
>>>
>>>         cheers,
>>>         robert.
>>>
>>>         On Fri, 2011-11-11 at 14:44 -0500, Wendy Qiao wrote:
>>>> Hi all,
>>>>
>>>> I am using the GSVA package for some analysis. I found that
>>>         the package
>>>> only takes the gene expression matrix annotated with
>>>         affymetrix probe IDs,
>>>> although the gene set collection is made of Entrez IDs. I
>>>         imagine there a
>>>> step in the package for converting the Affymetrix probe IDs
>>>         to Entrez IDs.
>>>> As my data are from the Illumina platform, I am wondering if
>>>         an expression
>>>> matrix annotated with Entrez IDs can be used directly.
>>>>
>>>> Thank you,
>>>> Wendy
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>         http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor