[BioC] Simple annotate question: difference in getSYMBOL and lookUp

James W. MacDonald jmacdon at med.umich.edu
Mon Jan 28 16:05:31 CET 2008


Hi Merja,

merja matilainen wrote:
> Hi!
> 
> Can someone explain me the difference in the following attempts to
> add annotation data my genes (I have a dataset done with Illumina
> arrays)?
> 
> This one works:
>> geneSymbol=getSYMBOL(fit2$genes$ID, 'lumiHumanV2') 
>> fit2$genes=data.frame(fit2$genes, geneSymbol=geneSymbol)
> 
> Here I get an error:
>> geneEntrez=lookUp(fit2$genes$ID, 'lumiHumanV2', 'ENTREZID') 
>> fit2$genes=data.frame(fit2$genes, geneEntrezID=geneEntrez)
> Error in data.frame(fit2$genes, geneEntrezID = geneEntrez) : 
> arguments imply differing number of rows: 48701, 1

In this case you could answer the question yourself:

 > getSYMBOL
function (x, data)
{
     unlist(lookUp(x, data, "SYMBOL"))
}
<environment: namespace:annotate>
 > lookUp
function (x, data, what, load = FALSE)
{
     if (length(x) < 1) {
         stop("No keys provided")
     }
     mget(x, envir = getAnnMap(what, chip = data, load = load),
         ifnotfound = NA)
}
<environment: namespace:annotate>

So getSYMBOL() is the same as lookUp(), only wrapped in a call to unlist().

Now unlist() will take a list and turn it into a vector, and mget() will 
return a list, so if you just wrap your call to lookUp() in an unlist(), 
then you should get results that can be converted to a data.frame.

Note however that this will not always work so cleanly. If any of the 
illumina IDs map to more than one Entrez Gene ID (I don't think they 
should), then the resulting vector will be too long and you won't be 
able to make a data.frame. You can always check this first by something 
like:

table(sapply(geneEntrez, length))

You might also want to extract the results from your fit2 object into a 
new data.frame rather than overwriting the existing object (you are 
making copies regardless).

Best,

Jim


> 
> I get the same error if I try to look for example for gene function.
> 
> I assume the answer is what type of data structure these two
> functions return. If I understood the vignette getSYMBOL gives me a
> vector and lookUp gives me a list. (the help topic says 'Either a
> vector or a list depending on whether multiple values per input are
> possible') Unfortunately I am not that familiar with R data
> structures yet. Could you tell me how I can add to the fit2$genes the
> entrez result? And perhaps explain why the match to the symbol of the
> gene is not giving multiple values if description is.
> 
> Thanks for your help!
> 
> Merja ###########################################
> 
> This message has been scanned by F-Secure Anti-Virus
> for...{{dropped:4}}
> 
> _______________________________________________ Bioconductor mailing
> list Bioconductor at stat.math.ethz.ch 
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623



More information about the Bioconductor mailing list