[BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA

Mon Feb 27 09:10:35 CET 2012

Hello Jim,
Thank you very much for your detailed reply. I did have some misconceptions about LIMMA indeed. I am not much in charge of the methodology in this case unfortunately and the requirement is to ignore the other expression values for probesets and only keep the probeset with maximum expression value for that gene symbol. 
I am afraid i am unable to use the findLargest() function from the gene filter since it needs the ENTREZ ID annotation and i am using annotation from a tab delimited text file. Working on the Human Gene 1.0 Gene ST Array and the relevant packages do not exist for the latest version of R. I will try and tweak it in my favour.

Alternatively I also tried the solution provided by Gordon but encounter memory errors. Will have to try the same on a higher RAM Machine.

Thanks and Regards,
Ekta

-----Original Message-----
From: James W. MacDonald [mailto:jmacdon at uw.edu] 
Sent: 23 February 2012 19:55
To: Ekta Jain
Cc: bioconductor at r-project.org
Subject: Re: [BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA

Hi Ekta,

On 2/22/2012 10:06 PM, Ekta Jain wrote:
> Hi Jim,
> I am using an affymetrix chip data. I need to analyse my dataset for differential gene expression (LIMMA). Each gene can be referenced by multiple probesets and while performing LIMMA the expression values of these multiple probesets gets averaged and this averaged value is assigned to that gene. I need to be able to simply select the probeset with the highest expression value to represent a gene.
>
> LIMMA by default averages the probeset values.

This is not true. The limma package doesn't know or care that two 
probesets are intended to interrogate the same gene, and doesn't do the 
averaging that you think it does. You can't even do a mixed model, using 
the 'duplicate' probesets because they aren't duplicates, and you don't 
have the same number of probesets per gene. What limma does is make 
univariate comparisons by-probeset, so if you have four probesets that 
interrogate the same gene transcript, then you will do four tests.

Now you could make the assumption (unfounded, IMO) that all the 
probesets that are intended to measure a particular transcript are 
really measuring the same thing, and then choose to use just one of them 
based on some metric. As an example, you could use 'highest expression 
value', which doesn't make any sense to me.

To expound on that last statement, let's say you have two transcripts 
that are purported to measure the same gene. Now let's further stipulate 
that one of these probesets has really high expression (somewhere around 
2^14), but the expression isn't materially different between any of your 
samples. In addition, the other probeset has almost undetectable 
expression in one set of samples, but some middling expression  (say 
2^8) in another set. Do you really want to throw out the latter probeset 
in favor of the former?

Now back to your question. If you want to pre-filter the data (again, 
not recommended with the limma package, due to the empirical Bayes 
estimator), you can use the findLargest() function in the genefilter 
package. You have to supply a test statistic to this function, for which 
you could use either the rowMean(), which will give you the highest 
average expression, or you could do something like apply(exprs(eset),1 , 
max) to get the maximum expression value.

Best,

Jim

>
> I am not sure if i need to modify any default settings in LIMMA or use another package.
>
> Thanks
>
> Regards,
> Ekta
>
> -----Original Message-----
> From: James W. MacDonald [mailto:jmacdon at uw.edu]
> Sent: 22 February 2012 19:26
> To: Ekta [guest]
> Cc: bioconductor at r-project.org; Ekta Jain
> Subject: Re: [BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA
>
> Hi Ekta,
>
> On 2/21/2012 10:57 PM, Ekta [guest] wrote:
>> Hello All,
>> I am relatively new to R and bioconductor. I would like to know if there is a way to alter LIMMA defualt options such that the package instead of averaging signal intensities of probesets selects the probesets with highest baseline
>> expression/signal intensity?
> You will have to be more precise than that. What exactly do you mean by
> 'selects the probesets with highest baseline expression'? Do you just
> want any probesets where one or more samples has high expression? That
> doesn't require limma. Or do you want probesets where some of the
> samples have much higher expression than others?
>
> Best,
>
> Jim
>
>
>> Any help would be greatly appreciated.
>>
>>
>>
>>    -- output of sessionInfo():
>>
>>> sessionInfo()
>> R version 2.9.1 (2009-06-26)
>> i386-pc-mingw32
>>
>> locale:
>> LC_COLLATE=English_India.1252;LC_CTYPE=English_India.1252;LC_MONETARY=English_India.1252;LC_NUMERIC=C;LC_TIME=English_India.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] limma_2.18.3
>>
>> --
>> Sent via the guest posting facility at bioconductor.org.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, and is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email.
> www.jubl.com
>

The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, and is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email.
www.jubl.com