[BioC] questions regarding ACME

Fri Jun 5 16:09:28 CEST 2009

On Tue, Jun 2, 2009 at 8:43 PM, <vandiedo at well.ox.ac.uk> wrote:
> Dear Sean,
>
> I am just a new user of your ACME package which is really fantastic. I
> would be very grateful if you could answer the following technical
> questions:

Hi, Claire.  Thanks for the interest.  I have been wanting to rewrite
significant chunks of ACME to use a new class structure and to fix
some known issues.  The new version is pretty close to complete.  It
is version 2.1.0 and is available in the development section of
bioconductor.  Give things a day or so to propogate through the build
system.

> 1) The current version of ACME cannot handle NA values because of a
> quantile function called in the do.aGFF.calc(). I have several samples to
> analyse in parallel and some probes have a NA value for one sample but not
> necessarily for the others. I wish I could use the same R object with all
> my probes for all my samples. So, I have been trying to edit the
> do.aGFF.calc() by adding na.rm=TRUE in the quantile function but then the
> following windowChisq() isn't recognised. I cannot find where this
> function has been defined. Would there be an easy way to deal with NA
> values?

A bug.  Now fixed.

> 2) It occurred to me ACME cannot work with only one sample...Why? Does it
> use the information from other samples?

A bug.  Now fixed.

> 3)...hence this silly question: does ACME well work on ratios (i.e IP
> versus input for ChIP on chip, or DNaseI in vivo versus in vitro digestion
> for HSD mapping) to search for enrichment?

ACME does not rely on a specific data type.  Use ratios, intensities,
or whatever.  As long as the signal is expected to increase in regions
of interest, ACME should work fine.

> 4) Another question relative to the samples. I am not sure to understand
> the meaning of the names of the output files generated by the write.sgr().
> Does "./1_thresh0.95.sgr" well correspond to the first sample? If so, then
> what does "NA" mean for the next ouput written? I understand this is the
> second sample but then, if I have three samples, the third output
> overwrites the second, etc...so only the last sample output is saved in
> addition to the first one. I wish I could save output for each of my
> samples.

Another "bug".  Fixed, hopefully.

> 5)Can findClosestGene() handle more recent assemblies than hg17?

Yes.  Fixed.  Just specify the alternative build (does not need to be human).

> 6) All p values are not adjusted for multiple testing. In the Scacheri PC
> et al paper, you recommend to estimate empirical p values by permutation.

I'm not sure that I think that will deal with the problem of multiple
testing.  You could certainly apply a multiple-testing correction if
you like.  I would take the p-values with a grain of salt and use them
as a guide but not representing "absolute" significance.

> 7) Could you please provide some more details on the smoothing approach
> implemented in ACME? I have been applying to my DHS data some similar
> window smoothing as described in Sabo PJ et al 2006 (PMID: 16791208). When
> applying ACME on these already smoothed data, I obtain much cleaner
> results than applying it on my normalised data. Therefore, I would be keen
> in knowing more in your smoothing method to make sure there isn't any
> redundancy.

There is no need to do any smoothing and it could be
counterproductive.  The moving-window chi-square is very robust to
both outliers and "poor probes" that don't show enrichment in a region
of enrichment.  There is no "smoothing" done at all because of this;
the heuristic itself functionally smooths the data.

> I appreciate this is a long list of questions and that your time is very
> precious. I thank you very much in advance for your help and for the time
> you will be willing to spend on it.

And I appreciate your valuable comments.  If you see problems with the
new version, let me know.

Sean