[BioC] understanding ACME
Ramon Diaz-Uriarte
rdiaz02 at gmail.com
Tue Feb 5 01:15:31 CET 2008
Hi Sean,
Oh, yes, it does help. I misunderstood because I thought that the
chi-square was comparing expected with observed, taking the expected
from the proportions of above/below the threshold from the complete
data set (i.e., as if doing a chi-square test against a know,
theoretical, proportion).
Thanks for your explanation.
Best,
R.
On Feb 4, 2008 8:44 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
> On Feb 4, 2008 1:50 PM, Ramon Diaz-Uriarte <rdiaz at cnio.es> wrote:
> > hits=-2.6 testsºYES_00
> > X-USF-Spam-Flag: NO
>
> >
> > Dear All,
> >
> > I am trying to understand how the ACME package (by Sean Davis) works, but I
> > think there is something I am missing about the way the p-values are
> > computed. It seems when I try to do the chi-square test myself, I always
> > overestimate the p-value.
> >
> >
> > The following is a complete example:
> >
> > silly.dat <- c(2, 1, 5, 3, 6, 4)
> > dummy.data <- new("aGFF", data = matrix(silly.dat, ncol = 1),
> > annotation = data.frame(Chromosome = 1,
> > Location = c(1, 10, 20, 1000, 1200, 1300)),
> > samples = data.frame(SampleID = 1))
> >
> > ### So we have
> > cbind(silly.dat, Position = c(1, 10, 20, 1000, 1200, 1300))
> > silly.dat Position
> > [1,] 2 1
> > [2,] 1 10
> > [3,] 5 20
> > [4,] 3 1000
> > [5,] 6 1200
> > [6,] 4 1300
> >
> >
> > do.aGFF.calc(dummy.data, window = 110, thresh = 0.90)
> > ### Cutpoints:
> > ### [1] 5.5
> > ### Thus, all values except 6 (the fifth value) are below the threshold
> >
> >
> > ## Within the window for fifth value we have:
> > ## only value of 6
> > do.aGFF.calc(dummy.data, window = 10, thresh = 0.90)@vals[5] ## 0.0877
> > chisq.test(x = c(0, 1), p = as.vector(table(silly.dat > 5.5)/6)) ## 0.02535
> >
> > ## value of 6 and 4
> > do.aGFF.calc(dummy.data, window = 202, thresh = 0.90)@vals[5] ## 0.3458
> > chisq.test(x = c(1, 1), p = as.vector(table(silly.dat > 5.5)/6)) ## 0.2059
> >
> > ## values 6, 4, 3
> > do.aGFF.calc(dummy.data, window = 402, thresh = 0.90)@vals[5] ## 0.571
> > chisq.test(x = c(2, 1), p = as.vector(table(silly.dat > 5.5)/6)) ## 0.4386
> >
> >
> > Where am I computing the chisq in the wrong way?
>
> Hi, Ramon. ACME is simply calculating the chi-square on a 2x2 table
> where the cells have the values defined like so:
>
> a=number of probes on the array above the threshold
> b=number of probes total on the array NOT above the threshold
> c=number of probes in the window above the threshold
> d=number of probes in the window NOT above the threshold
>
> So, building on your example above in which a=1,b=5,c=1, and d varies as below:
>
> ## d=0
> chisq.test(x=matrix(c(1,5,1,0),nc=2),correct=FALSE) ## 0.08767
>
> ## d=1
> chisq.test(x=matrix(c(1,5,1,1),nc=2),correct=FALSE) ## 0.3458
>
> ## d=2
> chisq.test(x=matrix(c(1,5,1,2),nc=2),correct=FALSE) ## 0.5708
>
> Does this explanation help?
>
> Sean
>
--
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz
More information about the Bioconductor
mailing list