[BioC] DiffBind Question

Fri Oct 18 12:05:00 CEST 2013

Hmmn, how exactly are you retrieving the matrix? When I run dba.peakset as
described below, the columns are named using the DBA_ID value for the
sample.

Can you send a sessionInfo() so I can see what version you are working on?

Here's what I see:

> data(tamoxifen_counts)
> tamoxifen = dba.count(tamoxifen,peaks=NULL,score=DBA_SCORE_READS)
> dba.peakset(tamoxifen,bRetrieve=T)
GRanges with 1654 ranges and 11 metadata columns:
       seqnames               ranges strand   |  BT474.1.  BT474.2.
MCF7.1.   MCF7.2.   MCF7.3.   T47D.1.   T47D.2. MCF7.1..1 MCF7.2..1
ZR75.1.   ZR75.2.
          <Rle>            <IRanges>  <Rle>   | <numeric> <numeric>
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
<numeric> <numeric>
     1    chr18     [ 95744, 102011]      *   |      2494      2239
2885      2202      2624      2951      7185      2125      1532      6399
    13055
     2    chr18     [179080, 179752]      *   |        16        25
28        20        46         4        10        77        64        19
     53
     3    chr18     [205229, 206269]      *   |        32        69
11        15        33         9        44        20        26       119
    208
     4    chr18     [301531, 302260]      *   |        52        40
21        14         9         6        19         3         9        34
     78
     5    chr18     [336592, 337347]      *   |        19        13
71        38        48         6        37        11        15        15
     27
   ...      ...                  ...    ... ...       ...       ...
...       ...       ...       ...       ...       ...       ...       ...
     ...
  1650    chr18 [75641971, 75642647]      *   |         3         7
39        25        41        17        16         8         5        13
     40
  1651    chr18 [75795125, 75796156]      *   |        53        60
204        84       152        15        26        57        21        14
      22
  1652    chr18 [75826088, 75826939]      *   |        36        45
23        14        33        25        36        40        22        46
     80
  1653    chr18 [76068994, 76069874]      *   |        64        64
58        44        58         9        42        90        52        51
    155
  1654    chr18 [76087923, 76089259]      *   |        27        10
74        50        58        11        32        43        26       232
    603

  ---
  seqlengths:
   chr18
      NA

Where "BT424.1." is a sample name. It turns out that GRanges doesn't like
the "+" and "-" -- you can see them if you get the result back as a
data.frame:

> head(dba.peakset(tamoxifen,bRetrieve=T,DataType=DBA_DATA_FRAME))
    CHR  START    END BT474.1- BT474.2- MCF7.1+ MCF7.2+ MCF7.3+ T47D.1+
T47D.2+ MCF7.1- MCF7.2- ZR75.1+ ZR75.2+
1 chr18  95744 102011     2494     2239    2885    2202    2624    2951
7185    2125    1532    6399   13055
2 chr18 179080 179752       16       25      28      20      46       4
  10      77      64      19      53
3 chr18 205229 206269       32       69      11      15      33       9
  44      20      26     119     208
4 chr18 301531 302260       52       40      21      14       9       6
  19       3       9      34      78
5 chr18 336592 337347       19       13      71      38      48       6
  37      11      15      15      27
6 chr18 346557 347362       53       52      11      13      13       9
  25      12      13      23      51

Cheers-
Rory

On 17/10/2013 21:15, "Stephen Mack" <stephen.mack at mail.utoronto.ca> wrote:

>Thanks Rory,
>
>When the matrix is exported, it looks like it's without headers
>indicating sample IDs. Is there any way of exporting these or determining
>what the order is? I assume it's just the same order that the samples
>were read in from left to right?
>
>Cheers,
>
>Steve
>
>
>On 2013-10-17, at 12:49 PM, Rory Stark <Rory.Stark at cruk.cam.ac.uk> wrote:
>
>> Hi Stephen-
>> 
>> 1. dba.analyze itself doesn't use any thresholds, it just computes the
>> p-values and FDRs for every binding site. The thresholds are used in the
>> reporting and plotting functions: dba.report, dba.plotMA,
>>dba.plotHeatmap,
>> dba.plotPCA, and dba.plotBox. The generic plot function corresponds to
>> dba.plotHeatmap and you can specify thresholds. eg:
>> 
>>> DBSites = dba.report(DBA,contrast=1, th=.05, bUsePval=TRUE)
>>> dba.plotMA(DBA, th=.05, bUsePvals=TRUE)
>> 
>> NOTE: I've just added the ability to set the default threshold for the
>> object instead of the global FDR < 0.1 currently implemented. This
>>feature
>> will appear in the development version (as 1.9.2) soon. These values
>>will
>> initialized as:
>> 
>>> DBA$config$th = 0.1
>>> DBA$config$pUsePval = FALSE
>> 
>> and can subsequently be changed at will.
>> 
>> 2. You can export the entire affinity binding matrix (based on the
>> consensus set after running dba.count) using dba.peakset. If you set the
>> "bRetrieve" parameter to TRUE, this will return the entire matrix (the
>> "DataType" parameter controls the form of how this is returned; default
>> type is GRanges). If you set the "writeFile" parameter to a filename,
>>the
>> data will be exported to a tab-delimited file of that name. eg:
>> 
>>> affinities = dba.peakset(DBA,writeFile="BindingAffintyMatrix.bed")
>> 
>> Cheers-
>> Rory
>> 
>> 
>> On 17/10/2013 16:55, "Stephen Mack" <stephen.mack at mail.utoronto.ca>
>>wrote:
>> 
>>> Hi Rory,
>>> 
>>> I was hoping you could help with with two DiffBind questions:
>>> 
>>> 1) Is it possible to adjust the p-value / FDR threshold for the
>>> dba.analyze function ?
>>> 
>>> 2) Is it possible to export the consensus / affinity binding matrices ?
>>> 
>>> Thanks,
>>> 
>>> Stephen Mack
>>> 
>>> Grad Student
>>> The Hospital for Sick Children
>>> Brain Tumour Research Centre
>>> Toronto, ON
>>> Canada
>> 
>> 
>
>