[BioC] Affymetrix mouse 430_2 array - gene expression and annotation

Mon Jul 21 18:42:42 CEST 2014

Hi Xiayu,

On 7/21/2014 12:08 PM, Rao,Xiayu wrote:
> Hello,
>
> I am now analyzing Affymetrix mouse 430_2 array, and need
> clarification for the following issues.
>
> 1) how to summarize the probe expression to the expression level of
> transcript/genes? We are interested in the gene expression. I know
> that for human 1.0 ST gene array, we can use oligo package to get
> transcript expression. And for illumina array, there are only few
> probes designed for each gene, so we can only look at the probe
> level. For this mouse 430_2 array, there are usually 11 probes. I am
> thinking that using rma may not be enough.

I'm not sure I follow your logic. As we have passed through time, the 
number of probes per probeset has continually fallen, to the point now 
that the Exon arrays (and HTA, for that matter) have only four probes 
per probeset (or fewer). The Gene ST arrays when summarizing at the 
transcript level have more, in general, but that is simply because Affy 
combined exon probesets together. If you summarize the Gene ST arrays at 
the probeset level, you have mostly four or fewer (!) probes per probeset.

So the old style 3'-biased arrays have in comparison a luxurious number 
of probes for rma() to summarize. You can use oligo for these arrays, or 
affy if you prefer. You will get identical results.

>
> 2) and add annotation thereafter? For the transcript level
> annotation, I have used the following code before. But not sure for
> this mouse array, is there a similar way or similar transcript
> database to do such? I know there is a database called mouse4302.db.
> ID <- featureNames(geneCore2) Symbol <-
> getSYMBOL(ID,"hugene10sttranscriptcluster.db") fData(geneCore2) <-
> data.frame(ID=ID,Symbol=Symbol)

This is an old way of annotating things, and has been superceded (for 
like five years now) by a more compact API:

fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), "SYMBOL")

And note you can add in other more useful things like the Gene ID as 
well (while biologists tend to like HUGO symbols, they are not, as 
advertized, actually unique things, so you always run the risk of 
thinking you have <a gene you care about> when in fact you are looking 
at the data for <some other gene with the same HUGO symbol>).

fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), 
c("SYMBOL","GENENAME","ENTREZID"))

Best,

Jim

>
> Any input would be very appreciated! Thank you very much in advance.
>
> Thanks, Xiayu
>
> [[alternative HTML version deleted]]
>
> _______________________________________________ Bioconductor mailing
> list Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099