[BioC] affy ids from gene symbols
mcarlson at fhcrc.org
Fri Mar 7 22:20:08 CET 2008
James W. MacDonald wrote:
> Say your input vector of symbol names is called 'input'.
> complist <- vector("list", length(input))
> names(complist) <- input
> library(hgu133plus2.db) ## note I am using the new package type!!
> mapp <- toTable(hgu133plus2SYMBOL)
> for(i in 1:length(input)) complist[[i]] <- mapp[grep(paste("^",
> input[i], "$", sep=""), mapp[,2]),1]
> Depending on if you want to assume your symbols exactly match the
> annotation package symbols, you might want to add in a tolower(), and
> possibly gsub() to remove things like '(', ')', '-', etc.
> IAIN GALLAGHER wrote:
>> Hello list.
>> I would like to return the affymetrix probe ids for a list of genes. Normally I would do this through biomaRt but the service is down all weekend.
>> I know the probe ids can be returned one at a time using regular expressions via
>>> gene1<-grep('^COPA$', symbols)
>> but I was wondering if there was a way to loop through the list of genes and 'grep' each one individually.
>> Thanks for any advice.
>> [[alternative HTML version deleted]]
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
So if you have gene symbols and you want to match them to probes, you
could also use the new "hgu133plus2ALIAS2PROBE" mapping found inside the
hgu133plus2.db package. That will map ALL known gene symbols instead
of just the most commonly used (standard) ones. This can be good if
your list contains less common gene symbols for some of the genes that
you are looking for.
So to summarize we have two mapping that can help you:
"hgu133plus2SYMBOL" which matches the most common "standard" gene symbol
(only one per gene) to each probe.
and "hgu133plus2ALIAS2PROBE" which matches ALL known gene symbols (known
to NCBI) to each probe.
The danger to using the 1st of these is that you will have an odd symbol
name in your list you might not get a match. The danger to using the
second one would happen if your gene symbol list had two different
symbol names for one thing in it. In that case, you could match each of
them and not know that you had hit the same symbol twice.
More information about the Bioconductor