[BioC] [R] Select single probe-set with median expression from multiple probe-sets corresponding to same gene -AFFY

Martin Morgan mtmorgan at fhcrc.org
Thu Apr 4 06:37:22 CEST 2013


On 04/03/2013 09:17 PM, Atul Kakrana wrote:
> Hello Martin and All,
>
> I think I was not clear with my question and therefore would like to rephrase
> it. I am analyzing Affymetrix data and one thing I need to do is that select one
> probe-set if there are multiple probe-set for same gene and the criteria I need
> to use it to select the probe set with highest median expression across all the
> samples.
>
> So, if there are 5 probe-sets corresponding to same gene than I need to select
> the one with highest median expression across all samples to represent the
> expression of that gene. As I am trying to change from probe-set level to gene
> level analysis I was hoping that there must be some function already to do this
> in 'affy' or 'limma'.
>
> @Martin: I think you suggested me the right solution even when I was not clear
> with my question. Could you please confirm that? Also, wouldn't it be better to
> perform this step after bg correction, normalization? I am very confused at this
> moment.

yes, I did suggest the solution you were looking for. Generally, you'd like to 
do these sorts of manipulations after normalization, etc. Martin

>
> mydata <- ReadAffy()
> pData(mydata)<- read.table("phenodata",head = T,row.names=1,sep = '\t')
> esetRMA <- rma(mydata)
>
>  >>>Perform probe set reduction here>>>
>
> I would really appreciate your suggestions on how and where I can select the
> probe-set  with higest median expression across all the samples.
>
> Thanks
>
> AK
>
>
>
>
>
>
> On 03-Apr-13 11:34 PM, Martin Morgan wrote:
>> On 04/03/2013 03:17 PM, Atul Kakrana wrote:
>>> Hello All,
>>>
>>> I need your help. I am analysing affymetrix data and have to select the
>>> probe-set that has median expression among all the probe-sets for same
>>> gene. This way I want to remove the redundancy by keeping the analysis
>>> to single gene entry level. I am fully aware that it is not a nice thing
>>> to do but I just have to do it.
>>>
>>> To do so, I came across 'findLargest' function of 'genefilter' package
>>> but it's not well documented; and I do not know how to implement the
>>> 'findLargest' function. At this point I have:
>>> esetRMA <- rma(mydata)
>>>
>>> Could anybody guide me on how can I select single probeset with median
>>> expression from multiple probe-sets corresponding to single gene and
>>> discard others? Is there any other way to achieve so i.e. other than
>>> using 'genefilter'?
>>>
>>> Genefilter package:
>>> http://www.bioconductor.org/packages/2.11/bioc/html/genefilter.html
>>
>> Hi Atul --It's a Bioconductor package, so might as well ask instead on the
>> Bioconductor mailing list
>>
>>   http://bioconductor.org/help/mailing-list/
>>
>> As a reproducible example, load the "ALL" sample ExpressionSet, Biobase and
>> genefilter packates
>>
>>   library(Biobase)
>>   library(ALL)
>>   library(genefilter)
>>
>> The three arguments to findLargest are the names of the probe sets
>>
>>   featureNames(ALL)
>>
>> the test statistic
>>
>>   rowMedians(ALL)
>>
>> and the chip from which the ExpressionSet is based
>>
>>   annotation(ALL)
>>
>> So the variable
>>
>>   idx = findLargest(featureNames(ALL), rowMedians(ALL), annotation(ALL)
>>
>> identifies the probes and
>>
>>   ALL1 = ALL[idx,]
>>
>> gets you the data you're interested in.
>>
>> Again, follow-up questions should go to the Bioconductor mailing list.
>>
>> Martin
>>
>>
>>>
>>> Thanks
>>>
>>> AK
>>>
>>
>>
>
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list