[BioC] Affy: probeset to gene expression with expresso

Martin Preusse martin.preusse at gmail.com
Tue Aug 13 21:26:32 CEST 2013


True … ;) There a lot of algorithms (i.e. papers) trying to evaluate each probeset and answer this question.   

E.g. this one:  

Jetset: selecting the optimal microarray probe set to represent a gene
http://www.biomedcentral.com/1471-2105/12/474

Unfortunately, as it is so often in bioinformatics, there are a lot of papers … and no validation or comparison between them.

Martin  


Am Dienstag, 13. August 2013 um 19:47 schrieb James W. MacDonald:

> I don't think there is an answer to these questions. Well, I think there  
> might be several hundred or maybe thousands of answers (e.g., for each  
> gene that is measured more than once there might be something reasonable  
> to do, based on what the duplicates are measuring), but we can only do  
> things in aggregate, and I don't think there is a simple solution that  
> can be applied en mass to all duplicated transcripts without making  
> pretty strong assumptions.
>  
> Because of this, I tend to default to the status quo and just report  
> probeset level data because I don't have any idea what the 'right' thing  
> to do is.
>  
> Best,
>  
> Jim
>  
>  
>  
> On 8/13/2013 12:03 PM, Martin Preusse wrote:
> > I am trying to figure out the same. There are ENDLESS publications dealing with exactly this topic.
> >  
> > Obviously, different probes bind to different parts of the transcript. So they might represent different transcripts of the same gene or genomic locus.
> >  
> > Maybe a mapping to transcript instead of gene is more useful. Another issue is that not all probes bind to the transcript with the same affinity. Some probes might even be pure noise. So if you average all of them the noise could cancel the signal from the more useful probes.
> >  
> > I try to dig deeper into this, but there is to much stuff published … does one of you have tips for good papers/reviews? Or maybe good books that help getting into microarray analysis?
> >  
> > Martin
> >  
> >  
> > Am Dienstag, 13. August 2013 um 17:49 schrieb Helen Smith:
> >  
> > > Hi,
> > >  
> > > Thank you Jim.
> > >  
> > > Can I ask, I have always averaged the expressions and they completed pathway analysis for the genes rather than the probes. Do you consider it better to leave it as individual probes and assess individual expression at the pathway level?
> > > I'm torn as to which is the best approach,
> > >  
> > > Thanks,
> > > Helen
> > >  
> > > -----Original Message-----
> > > From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of James W. MacDonald
> > > Sent: 13 August 2013 16:28
> > > To: Martin Preusse
> > > Cc: bioconductor at r-project.org (mailto:bioconductor at r-project.org)
> > > Subject: Re: [BioC] Affy: probeset to gene expression with expresso
> > >  
> > > Hi Martin,
> > >  
> > > I just answered a very closely related question. See if this helps:
> > >  
> > > https://stat.ethz.ch/pipermail/bioconductor/2013-August/054353.html
> > >  
> > > Best,
> > >  
> > > Jim
> > >  
> > >  
> > >  
> > > On 8/13/2013 9:47 AM, Martin Preusse wrote:
> > > > I am trying to get the gene level expression values from an Affy micro array, i.e. merge the values for probe sets representing the same gene.
> > > >  
> > > > I tried to use the 'expresso' function from the affy package, but I always end up with an ExpressionSet containing probe sets, not genes.
> > > >  
> > > > What is an easy way to summarize/merge probe sets to (entrez) genes?
> > > >  
> > > >  
> > > > library(affydata)
> > > > library(affy)
> > > >  
> > > > # get the 'Dilution' affy batch
> > > > data(Dilution)
> > > >  
> > > > eset<- expresso(Dilution, bgcorrect.method='rma',
> > > > normalize.method='constant', pmcorrect.method='pmonly',
> > > > summary.method='avgdiff')
> > > >  
> > > >  
> > > > write.exprs(eset,'testfile.txt')
> > > >  
> > > >  
> > > > P.S.: I know it might not be the best idea to average probe sets, but
> > > > I would like to try ;)
> > > >  
> > > > Cheers
> > > > Martin
> > > >  
> > > > _______________________________________________
> > > > Bioconductor mailing list
> > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org)
> > > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > > Search the archives:
> > > > http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >  
> > >  
> > >  
> > >  
> > >  
> > > --
> > > James W. MacDonald, M.S.
> > > Biostatistician
> > > University of Washington
> > > Environmental and Occupational Health Sciences
> > > 4225 Roosevelt Way NE, # 100
> > > Seattle WA 98105-6099
> > >  
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org)
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >  
>  
>  
>  
> --  
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099



More information about the Bioconductor mailing list