[BioC] nsFilter cutoff

Tue Jun 24 12:14:37 CEST 2008

Yes that makes perfect sense now. I thought this might be the case, but 
the additional filtering (by having Entrez id for example) meant that I 
didn't have half the number of initial probesets, which threw me a little.

Thanks and regards,

Jim

James W. MacDonald wrote:
> Hi James,
>
> james perkins wrote:
>> Hi James
>>
>> I meant when we have filterByQuantile as TRUE. In this case it seems 
>> to behave differently, and I can't figure out why, and I don't want 
>> to guess!
>
> OK. That's a different question. The details section of the help page 
> explains this:
>
> Note that by default the numerical-filter cutoff is interpreted as
>      a quantile, so leaving the default values intact would filter out
>      50% of the genes remaining at this stage. If you prefer to set the
>      cutoff at some absolute threshold, change the value of
>      'varByQuantile' to 'FALSE', and modify 'var.cutoff' accordingly.
>
> And looking at the code should help further:
>
>
>  if (var.filter) {
>         esetIqr <- apply(exprs(eset), 1, var.func)
>         if (filterByQuantile) {
>             if (0 < var.cutoff && var.cutoff < 1) {
>                 var.cutoff = quantile(esetIqr, var.cutoff)
>             }
>             else stop("Cutoff Quantile has to be between 0 and 1.")
>         }
>         selected <- esetIqr > var.cutoff
>
> So if you leave varByQuantile = TRUE then after you do the 
> annotation-based filtering (GO, Entrez Gene, AFFX probesets, 
> duplicates), you will take what remains and filter out 50%.
>
> Does that help?
>
> Best,
>
> Jim
>
>
>>
>> Regards,
>>
>> Jim
>>
>> James W. MacDonald wrote:
>>> Hi James,
>>>
>>> james perkins wrote:
>>>> Hi,
>>>>
>>>> I am finding the nsFilter IQR cutoff somewhat confusing.
>>>>
>>>> It says it is using IQR with a default cutoff of 0.5.
>>>>
>>>> This gives the impression that if you line up the data and take the 
>>>> value between the 0.25 and 0.75 quantiles, you would keep the 
>>>> probeset if this value was < 0.5
>>>>
>>>> However this is not the case, so I would like to know how exactly 
>>>> does this work?
>>>
>>> Actually it _is_ the case - perhaps you misunderstand something.
>>>
>>> First, get all probesets with an IQR > 0.5
>>> > T1 <- apply(exprs(sample.ExpressionSet), 1, IQR) > 0.5
>>>
>>> Now do the same using nsFilter()
>>> > T2 <- nsFilter(sample.ExpressionSet, FALSE, filterByQuantile = 
>>> FALSE, feature.exclude="", remove.dupEntrez = FALSE)
>>>
>>> Are they the same?
>>> > all.equal(featureNames(sample.ExpressionSet)[T1], 
>>> featureNames(T2$eset))
>>> [1] TRUE
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>>
>>>> Regards,
>>>>
>>>> James
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: 
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>