[BioC] [R] Select single probe-set with median expression from multiple probe-sets corresponding to same gene -AFFY

Atul Kakrana atulkakrana at gmail.com
Thu Apr 4 06:17:37 CEST 2013


Hello Martin and All,

I think I was not clear with my question and therefore would like to 
rephrase it. I am analyzing Affymetrix data and one thing I need to do 
is that select one probe-set if there are multiple probe-set for same 
gene and the criteria I need to use it to select the probe set with 
highest median expression across all the samples.

So, if there are 5 probe-sets corresponding to same gene than I need to 
select the one with highest median expression across all samples to 
represent the expression of that gene. As I am trying to change from 
probe-set level to gene level analysis I was hoping that there must be 
some function already to do this in 'affy' or 'limma'.

@Martin: I think you suggested me the right solution even when I was not 
clear with my question. Could you please confirm that? Also, wouldn't it 
be better to perform this step after bg correction, normalization? I am 
very confused at this moment.

mydata <- ReadAffy()
pData(mydata)<- read.table("phenodata",head = T,row.names=1,sep = '\t')
esetRMA <- rma(mydata)

 >>>Perform probe set reduction here>>>

I would really appreciate your suggestions on how and where I can select 
the probe-set  with higest median expression across all the samples.

Thanks

AK






On 03-Apr-13 11:34 PM, Martin Morgan wrote:
> On 04/03/2013 03:17 PM, Atul Kakrana wrote:
>> Hello All,
>>
>> I need your help. I am analysing affymetrix data and have to select the
>> probe-set that has median expression among all the probe-sets for same
>> gene. This way I want to remove the redundancy by keeping the analysis
>> to single gene entry level. I am fully aware that it is not a nice thing
>> to do but I just have to do it.
>>
>> To do so, I came across 'findLargest' function of 'genefilter' package
>> but it's not well documented; and I do not know how to implement the
>> 'findLargest' function. At this point I have:
>> esetRMA <- rma(mydata)
>>
>> Could anybody guide me on how can I select single probeset with median
>> expression from multiple probe-sets corresponding to single gene and
>> discard others? Is there any other way to achieve so i.e. other than
>> using 'genefilter'?
>>
>> Genefilter package:
>> http://www.bioconductor.org/packages/2.11/bioc/html/genefilter.html
>
> Hi Atul --It's a Bioconductor package, so might as well ask instead on 
> the Bioconductor mailing list
>
>   http://bioconductor.org/help/mailing-list/
>
> As a reproducible example, load the "ALL" sample ExpressionSet, 
> Biobase and genefilter packates
>
>   library(Biobase)
>   library(ALL)
>   library(genefilter)
>
> The three arguments to findLargest are the names of the probe sets
>
>   featureNames(ALL)
>
> the test statistic
>
>   rowMedians(ALL)
>
> and the chip from which the ExpressionSet is based
>
>   annotation(ALL)
>
> So the variable
>
>   idx = findLargest(featureNames(ALL), rowMedians(ALL), annotation(ALL)
>
> identifies the probes and
>
>   ALL1 = ALL[idx,]
>
> gets you the data you're interested in.
>
> Again, follow-up questions should go to the Bioconductor mailing list.
>
> Martin
>
>
>>
>> Thanks
>>
>> AK
>>
>
>


-- 
Atul Kakrana
DBI, Delaware Technology Park



More information about the Bioconductor mailing list