[BioC] DEseq for chip-seq data normalisation

Giuseppe Gallone giuseppe.gallone at dpag.ox.ac.uk
Wed Nov 6 13:24:19 CET 2013


Hi Rory

I suppose the analysis I'd like to carry out is conceptually a simpler 
one than those I was looking at using Diffbind. I'd like to look at the 
existence-vs-full depletion of peaks at each interval across samples - 
as opposed to continuous quantitative differences in the peak signal at 
each interval across samples.

Given 10 samples, if I take the consensus peakset generated by DiffBind 
with, say minOverlap=8, I'd like to look at each of those peak positions 
and find sample pairs where a peak is there for sample_1 but not there 
for sample_2. Then a question could be: are there genetic differences in 
the motif under this peak which might cause the complete depletion?

If I had technical replicates, I image I'd be able to use diffbind as 
follows

-set contrast: (sample_1r1, ..., sample_1ri) VS (sample_2r1, ..., 
sample_2ri)
-dba_analyse (sample1 vs sample2)
-follow up peaks sites with complete depletion VS signal

However I do not have technical replicates and if I understand correctly 
I cannot use diffbind/edgeR in this case.

So my idea is to simple select peak positions with large overlap across 
sample after bam subsampling to a minimum common value, and then look at 
(signal/nosignal) pairs for those peak positions manually.

Would you recommend to use the overall scaling factors for each library 
in this case?

Cheers
G

On 11/06/13 11:32, Rory Stark wrote:
> Hi Guiseppe-
>
> I'm not sure why you want to downsample? The normalization is supposed to
> take care of differences in read depth and distribution of read densities
> amongst peaks.
>
> edgeR/DESeq/DESeq2 do calculate overall scaling factors for each library
> as part of the normalization computation, so it may be useful to retrieve
> those to see how each library is being weighted. This would be better than
> basically reverse-enginering it by calculating the ratios between the
> original values and the normalized ones.
>
> Cheers-
> R
>
>> Date: Wed, 06 Nov 2013 10:48:56 +0000
>> From: Giuseppe Gallone <giuseppe.gallone at dpag.ox.ac.uk
>>
>> Thanks a lot Rory. Do you think it would then make sense to use the
>> normalised counts in the peaks to build a ratio based on the raw count
>> and then feed this ratio to, say, picard to get a downsampled bam from
>> the original bam?
>>
>> Best
>> Giuseppe
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list