[BioC] Assigning gene symbols to Affymetrix data and averaging probes

Wed Oct 3 20:29:04 CEST 2012

Hi Jim

Thanks, the reannotation worked a treat. I've been able to export the normalized data in annotated format.

I am adverse to removing probes that have no Entrez ID associated with them as I want to put the whole set of data through limma. I can't use the annotated expr.loess in lmFit, but is there a way I can get the symbol information into the output of lmFit (for instance, as fit$symbol)?

Best wishes
Lesley
.

________________________________________
From: James W. MacDonald [jmacdon at uw.edu]
Sent: 03 October 2012 16:30
To: Hoyles, Lesley
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Assigning gene symbols to Affymetrix data and averaging probes

Hi Lesley,

On 10/3/2012 10:55 AM, Hoyles, Lesley wrote:
>  Hi
>
>  I have processed my affy data and am able to annotate the object
>  mice.loess using the following. ID <- featureNames(mice.loess) Symbol
>  <- getSYMBOL(ID,'mouse4302.db') fData(mice.loess) <-
>  data.frame(ID=ID,Symbol=Symbol)
>
>
>  However, when I convert my object as follows - expr.loess <-
>  exprs(mice.loess) - I lose the annotation and have been unable to
>  find a way to annotate expr.loess. Please could anybody suggest how I
>  can annotate expr.loess?
expr.loess <- data.frame(ID = ID, Symbol = Symbol, exprs(mice.loess))

>
>
>  Is there a way of averaging probes for each gene with Affymetrix
>  data? I've been able to do this with single-channel Agilent data
>  using the example given in the limma guide.

There are probably two reasonable ways to do this. First, the easiest.

dat <- ReadAffy(cdfname = "mouse4302mmentrezcdf")

and proceed from there. This will use the MBNI re-mapped CDF package
based on Entrez Gene IDs, and you will have a single value per gene
after summarization. There are other ways to map the probes; see
http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp
at the bottom of the page for more info.

Alternatively if you want to stick with the original probesets, the
problem arises that some probesets are not well annotated, so what to do
with those? In addition, gene symbols are not guaranteed to be unique,
so you can't just assume that they are. Entrez Gene and UniGene IDs are
supposed to be unique, so you could go with them, doing something like
(untested)

gns <- toTable(mouse4302ENTREZID)
alldat <- merge(gns, expr.loess, by = 1) ## where expr.loess is the
data.frame I suggest above
alldatlst <- tapply(1:nrow(alldat), alldat$gene_id, function(x) alldat[x,])
combined.data <- do.call("rbind", lapply(alldatlst, function(x)
c(x[1,1:3], colMeans(x[,-c(1:3)])))

Here I am assuming that after the merge() step the first three columns
are the probeset ID, gene_id, symbol, and the remaining columns are the
expression values. You will lose all data for which there isn't an
Entrez Gene ID, but the same is true of the MBNI method I outline above.

Best,

Jim

>
>
>  Thanks in advance for your help.
>
>  Best wishes Lesley _______________________________________________
>  Bioconductor mailing list Bioconductor at r-project.org
>  https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>  archives:
>  http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099