[BioC] Affymetrix mouse 430_2 array - gene expression and annotation

James W. MacDonald jmacdon at uw.edu
Mon Jul 21 19:44:35 CEST 2014


Hi Xiayu,

On 7/21/2014 1:19 PM, Rao,Xiayu wrote:
> Hi, Jim
>
> Thanks a lot for your prompt reply and detailed explanation. You are always very helpful.
>
> So did you mean that I can use either of the following to get
transcript/gene expression for the mouse 430_2 array and other 3'-based
arrays?
>        oligo: geneCore <- rma(mydata, target = "core")

There is no such concept (nor argument) for the 3'-biased arrays. From ?rma:

        ## S4 method for signature 'ExpressionFeatureSet'
      rma(object, background=TRUE, normalize=TRUE, subset=NULL)

You can only summarize the 3'-biased arrays at one level, because there 
is only one level. In other words, unlike the Gene ST and Exon ST 
arrays, each probe belongs only to a single probeset, and there are no 
alternative (Affy-sanctioned) ways to combine probes into probesets.

So these are equivalent:

oligo::rma(mydata)
affy::rma(mydata)

Best,

Jim



>        affy: rma(mydata)
>
> Thanks,
> Xiayu
>
>
>
> -----Original Message-----
> From: James W. MacDonald [mailto:jmacdon at uw.edu]
> Sent: Monday, July 21, 2014 11:43 AM
> To: Rao,Xiayu; 'bioconductor at r-project.org'
> Subject: Re: [BioC] Affymetrix mouse 430_2 array - gene expression and annotation
>
> Hi Xiayu,
>
> On 7/21/2014 12:08 PM, Rao,Xiayu wrote:
>> Hello,
>>
>> I am now analyzing Affymetrix mouse 430_2 array, and need
>> clarification for the following issues.
>>
>> 1) how to summarize the probe expression to the expression level of
>> transcript/genes? We are interested in the gene expression. I know
>> that for human 1.0 ST gene array, we can use oligo package to get
>> transcript expression. And for illumina array, there are only few
>> probes designed for each gene, so we can only look at the probe level.
>> For this mouse 430_2 array, there are usually 11 probes. I am thinking
>> that using rma may not be enough.
>
> I'm not sure I follow your logic. As we have passed through time, the number of probes per probeset has continually fallen, to the point now that the Exon arrays (and HTA, for that matter) have only four probes per probeset (or fewer). The Gene ST arrays when summarizing at the transcript level have more, in general, but that is simply because Affy combined exon probesets together. If you summarize the Gene ST arrays at the probeset level, you have mostly four or fewer (!) probes per probeset.
>
> So the old style 3'-biased arrays have in comparison a luxurious number of probes for rma() to summarize. You can use oligo for these arrays, or affy if you prefer. You will get identical results.
>
>
>>
>> 2) and add annotation thereafter? For the transcript level annotation,
>> I have used the following code before. But not sure for this mouse
>> array, is there a similar way or similar transcript database to do
>> such? I know there is a database called mouse4302.db.
>> ID <- featureNames(geneCore2) Symbol <-
>> getSYMBOL(ID,"hugene10sttranscriptcluster.db") fData(geneCore2) <-
>> data.frame(ID=ID,Symbol=Symbol)
>
> This is an old way of annotating things, and has been superceded (for like five years now) by a more compact API:
>
> fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2), "SYMBOL")
>
> And note you can add in other more useful things like the Gene ID as well (while biologists tend to like HUGO symbols, they are not, as advertized, actually unique things, so you always run the risk of thinking you have <a gene you care about> when in fact you are looking at the data for <some other gene with the same HUGO symbol>).
>
> fData(geneCore2) <- select(mouse4302.db, featureNames(geneCore2),
> c("SYMBOL","GENENAME","ENTREZID"))
>
>
> Best,
>
> Jim
>
>
>>
>> Any input would be very appreciated! Thank you very much in advance.
>>
>> Thanks, Xiayu
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________ Bioconductor mailing
>> list Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>> archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list