[BioC] Genefilter parameters for mouse 430 2 #3

Thu Mar 20 18:48:10 CET 2008

Hi Rich,

Richard Friedman wrote:
> Jim,
> 
>     Thanks again for your quick and helpful  reply.
> I have some disagreements and a further question.
> 
> On Mar 19, 2008, at 11:27 PM, James W. MacDonald wrote:
> 
>> Hi Rich,
>>
>> Richard Friedman wrote:
>>> Jim,
>>>     Thank you for your detailed and helpful reply.
>>> On Mar 19, 2008, at 4:52 PM, James W. MacDonald wrote:
>>>>
>>>> That depends. If you are using rma(), then no ;-P
>>> what about gcrma.
>>
>> Same diff. The maximum with either will be ~14, so filtering on 100 
>> will remove everything.
>>
> 
> I filtered on log2(100)=6.64,  which is well under 14. Based upon this 
> filter alone I got
> 9681 probesets.
> 
> This as about 25% of the probesets. I guess I still am wondering if 
> there is a way of taking the
> intensity curve into account in setting the cutoff.

Ah. I missed the log2() part, and assumed you were using MAS5 numbers. 
My bad.

> 
> 
> 
>>>>
>>>> You might try something like
>>>>
>>>> eset2 <- nsFilter(eset)$eset
>>>>
>>>> and see how many probesets you end up with.
>>> I have tried
>>>  > xen2nsSUB<-nsFilter(xen2dataeset)$xen2dataeset
>>>  > sum(xen2nsSUB)
>>> [1] 0
>>>  > xen2nsSUB
>>> NULL
>>
>> Yup. That should be
>>
>> xen2nsSUB <- nsFilter(xen2dataeset)$eset
>>
>> if you just want the resulting ExpressionSet.
> 
> Most helpful!
>>
>>>>
>>>>> If you are just doing fold changes, you might consider filtering on 
>>>>> each fold change rather than overall. For instance you could create 
>>>>> a filter
>>>>
>>>> filt <- filterfun(kOverA(1, 100))
>>>>
>>>> that you would then use for each fold change comparison to ensure 
>>>> that at least one of the samples had an expression > 100. Shameless 
>>>> plug - see foldFilt() in affycoretools.
>>> I think that that is basically what I did with genefilter 
>>> pOverA(0.25,log2(100)
>>> described in my first note (.25 of 4 =1). Or am I getting somehing 
>>> wrong.
>>
>> Well, that isn't what you did (or maybe it is what you did, but you 
>> didn't do what I am suggesting). If you are doing fold change 
>> calculations then you (IMO) only care about the two things under 
>> consideration. So if you have something like this:
>>
>> Samples    1    2    3    4
>> expval    30    85    1500    2500
>>
>> Then what you did will nuke that probeset. However, the comparisons 
>> for 1v3, 1v4, 2v3, 2v4 and 3v4 are probably quite useful. The only one 
>> you don't care about is 1v2, which will give a high fold change but it 
>> is probably not meaningful.
>>
> 
> I fear that I don't understand filterfun. when I used kOverA(1,log2(100) 
> instead of pOverA above, I get the same # of probesets as I did
> with pOverA(.25,log2(100)) (9681).
> 
> As I understand pOverA(.25, 100) it would not elminate this probeset 
> because at least 25% is above 100).

Again, my bad. You are correct that the pOverA() and kOverA() filters 
will be the same.

But my main contention (that overall filtering if you are doing fold 
change analyses is less useful) remains. In your case this is a moot 
point since you do have duplicates. However, if you just had single 
samples (say control and three treatments), then filtering using either 
kOverA(1, log2(100)) or pOverA(0.25, log2(100)) can still end up giving 
bad results.

Say the data were slightly different:

Samples	1	2	3	4
expval	2	4	6	10

Note these are log_2 data.

Your filter would keep this probeset, and in each of the 1v2, 1v3, 2v3 
comparisons you would get four-fold differences even though you really 
didn't want to see this sort of thing.

However, if you did a kOverA(1,log2(100)) as a filter for each fold 
change, you would only end up with comparisons that involved sample 4, 
which is what you would like (hence my shameless plug for foldFilt()).

Best,

Jim

> 
> Best wishes,
> Rich
> 
> 
> 
> 
>>>>
>>>>

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623