[BioC] Genefilter parameters for mouse 430 2 #3
James W. MacDonald
jmacdon at med.umich.edu
Thu Mar 20 18:48:10 CET 2008
Richard Friedman wrote:
> Thanks again for your quick and helpful reply.
> I have some disagreements and a further question.
> On Mar 19, 2008, at 11:27 PM, James W. MacDonald wrote:
>> Hi Rich,
>> Richard Friedman wrote:
>>> Thank you for your detailed and helpful reply.
>>> On Mar 19, 2008, at 4:52 PM, James W. MacDonald wrote:
>>>> That depends. If you are using rma(), then no ;-P
>>> what about gcrma.
>> Same diff. The maximum with either will be ~14, so filtering on 100
>> will remove everything.
> I filtered on log2(100)=6.64, which is well under 14. Based upon this
> filter alone I got
> 9681 probesets.
> This as about 25% of the probesets. I guess I still am wondering if
> there is a way of taking the
> intensity curve into account in setting the cutoff.
Ah. I missed the log2() part, and assumed you were using MAS5 numbers.
>>>> You might try something like
>>>> eset2 <- nsFilter(eset)$eset
>>>> and see how many probesets you end up with.
>>> I have tried
>>> > xen2nsSUB<-nsFilter(xen2dataeset)$xen2dataeset
>>> > sum(xen2nsSUB)
>>>  0
>>> > xen2nsSUB
>> Yup. That should be
>> xen2nsSUB <- nsFilter(xen2dataeset)$eset
>> if you just want the resulting ExpressionSet.
> Most helpful!
>>>>> If you are just doing fold changes, you might consider filtering on
>>>>> each fold change rather than overall. For instance you could create
>>>>> a filter
>>>> filt <- filterfun(kOverA(1, 100))
>>>> that you would then use for each fold change comparison to ensure
>>>> that at least one of the samples had an expression > 100. Shameless
>>>> plug - see foldFilt() in affycoretools.
>>> I think that that is basically what I did with genefilter
>>> described in my first note (.25 of 4 =1). Or am I getting somehing
>> Well, that isn't what you did (or maybe it is what you did, but you
>> didn't do what I am suggesting). If you are doing fold change
>> calculations then you (IMO) only care about the two things under
>> consideration. So if you have something like this:
>> Samples 1 2 3 4
>> expval 30 85 1500 2500
>> Then what you did will nuke that probeset. However, the comparisons
>> for 1v3, 1v4, 2v3, 2v4 and 3v4 are probably quite useful. The only one
>> you don't care about is 1v2, which will give a high fold change but it
>> is probably not meaningful.
> I fear that I don't understand filterfun. when I used kOverA(1,log2(100)
> instead of pOverA above, I get the same # of probesets as I did
> with pOverA(.25,log2(100)) (9681).
> As I understand pOverA(.25, 100) it would not elminate this probeset
> because at least 25% is above 100).
Again, my bad. You are correct that the pOverA() and kOverA() filters
will be the same.
But my main contention (that overall filtering if you are doing fold
change analyses is less useful) remains. In your case this is a moot
point since you do have duplicates. However, if you just had single
samples (say control and three treatments), then filtering using either
kOverA(1, log2(100)) or pOverA(0.25, log2(100)) can still end up giving
Say the data were slightly different:
Samples 1 2 3 4
expval 2 4 6 10
Note these are log_2 data.
Your filter would keep this probeset, and in each of the 1v2, 1v3, 2v3
comparisons you would get four-fold differences even though you really
didn't want to see this sort of thing.
However, if you did a kOverA(1,log2(100)) as a filter for each fold
change, you would only end up with comparisons that involved sample 4,
which is what you would like (hence my shameless plug for foldFilt()).
> Best wishes,
James W. MacDonald, M.S.
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
Ann Arbor MI 48109
More information about the Bioconductor