[BioC] BH vs BY in p.adjust

Sat Jul 29 01:53:44 CEST 2006

Quoting Wolfgang Huber <huber at ebi.ac.uk>:

> Hi Caroline,
>
> what does
>
>  hist(fit2$p.value[,1], breaks=100, col="orange")
>
> look like?

hi wolfgang,
well.... not at all uniform:

> x <- hist(fit2$p.value[,1], breaks=30, 
> col="orange",main="distribution of raw p-values",labels=T)

> cbind(x$mids,x$counts)
       [,1] [,2]
[1,] 0.025 1891
[2,] 0.075  270
[3,] 0.125  209
[4,] 0.175  123
[5,] 0.225  100
[6,] 0.275  101
[7,] 0.325   79
[8,] 0.375   79
[9,] 0.425   85
[10,] 0.475   57 .....

but from here on, the distribution is uniform (around 50 in every bin until
p-val=1). so there are a lot of differential probesets in this contrast. but
between 519 and 1032 as estimated from BY and BH adjustments with 1% FDR,
there's quite a difference.... or can i estimate it directly from this
histogram .....substracting the baseline gives me 2439 probesets, 
almost 70% of
the whole set:

> baseline <- mean(x$counts[11:20])
> sum(x$counts-baseline)
[1] 2439

how safe is this ?

Assuming that the p-values are uniform distributed under
> the null hypothesis (which they should be if they are worth their 
> name), then the shape of this distribution can give you an idea of 
> the proportion of differential vs non-differential probesets.

by the way, in cases that it's not uniformly distributed, from the 
range values
of the over-represented bins on the histogram, can we not get an idea of the
effect size associated with the differential probesets responsible for this
non-uniformity ?
or the other way around, if i happened to know that there were differential
probesets but all of only moderate effect size, i might expect a bulge at
moderate p-values, while lower ones could well instead be uniformly
distributed, right?
but then if that were the case, could it also be that if all differential
probesets had similar p-values, say 0.2,  they could more easily be discovered
than the same number associated to a lower but wider ranger of p-values, only
because they would add significance to each other?
this doesn't quite sound right if it's true that the adjustment procedure
preserves the rank that the genes have from the p-value.

thanks for the help and comments

caroline

>
> Best wishes
>  Wolfgang
>
> -- 
> ------------------------------------------------------------------
> Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber
>
>> dear list,
>>
>> i am analysing a set of affymetrix chips using limma to get lists of
>> differentially expressed probesets.
>> concerning BH and BY, the two methods available with p.adjust that adjust
>> p-values controlling for the false discovery rate, i was surprised to find a
>> large discrepancy in the number of differentially expressed genes in 
>> the case
>> of one contrast of interest:
>>
>>> fit <- lmFit(eset.f,design)
>>> fit2 <- contrasts.fit(fit,c.matn)
>>> fit2 <- eBayes(fit2)
>>
>> # adjustment for multiple testing of the set of p-values derived 
>> from the first
>> contrast:
>>
>>> p.value <- fit2$p.value[,1]
>>> bh <- p.adjust(p.value, method = "BH")
>>> by <- p.adjust(p.value, method = "BY")
>>
>> # number of differentially expressed genes below the cutoff value of 
>> 0.01 for
>> the 2 methods
>>
>>> length(which(bh<0.01))
>>
>> [1] 1032
>>
>>> length(which(by<0.01))
>>
>> [1] 519
>>
>> # size of the original set: 3549 probesets (with 15 observations for each)
>>
>>> dim(exprs(eset.f))
>>
>> [1] 3549   15
>>
>> i don't think there is a bug anywhere (but i'd gladly send you my data to
>> reproduce this), i'm more inclined to think that i know too little to make
>> sense of this at this stage....
>>
>> could this difference be due to the fact that these numbers correspond to a
>> large fraction of the whole set tested and that the two methods behave
>> differently with respect to this?
>>
>> if estimates of false positives at this cutoff were reliable with 
>> both methods,
>> wouldn't this imply that BH gives a far better coverage of the set of truly
>> differentially expressed genes than BY (TP/(TP+FN)), for the same 
>> control over
>> false discovery (here, 1%)?
>>
>> i've seen mentioned but only vaguely that the magnitude of change is 
>> an issue.
>> that the methods work best when it is low.... if you think that may be it,
>> could you point me to something i might read to understand this better?
>>
>> a related question i have is about the interpretation of FDR values 
>> close to 1,
>> in such cases where there are many changes, ie when the fraction of
>> non-differentially expressed genes in the whole set is far below 1...
>>
>> i've come to understand that methods controlling fdr substitute the 
>> p-value of
>> each gene for a value that corresponds to the minimum cut-off to 
>> apply in order
>> for that gene to be called positive, with the added info that such a set
>> obtained by applying this minimum cutoff can be expected to contain that
>> percentage of false positives.
>> but as the fdr rate ranges from 0 to 1, the chosen cutoff can fall above the
>> true percentage of non-differentially expressed sets and i don't see 
>> how such
>> an estimate can still be reliable up to 1 if the true number of 
>> false positives
>> is  far lower.... unless it represents only an upper bound of how many false
>> positives to expect... is this the case ?
>>
>> thanks for any comment, i'm confused about this....
>>
>> caroline
>>
>
>

-- 
Caroline Lemerle
PhD student, lab of Luis Serrano
Structural Biology and Biocomputing Dept
tel 00-49-6221-387-8335
--