[BioC] error loading dba in DiffBind

Wed Jun 19 19:54:50 CEST 2013

Yes, I extended the detail with which you can specify a peak file format in 2.6; as a result it is a bit fussier, if more flexible in the range of formats.  I do apologize for the lack of backward compatibility in this particular case!

-Rory
________________________________________
From: Gordon Brown
Sent: 19 June 2013 17:23
To: Lawson, Nathan
Cc: bioconductor at r-project.org; Rory Stark
Subject: Re: error loading dba in DiffBind

Hi,

Turns out your sample sheet doesn't specify the peak format or peak
caller.  If you try again with:

> x = dba(sampleSheet='AVbothChr6.csv',peakFormat='bed')

you should be able to create the DBA object.  The surprising thing is that
it worked as-is in R 2.15.  Maybe Rory changed the default peak format...
not sure.

Anyway, let me know if you have further trouble.

Cheers,

 - Gord

On 2013-06-19 16:51, "Lawson, Nathan" <Nathan.Lawson at umassmed.edu> wrote:

>
>Gord,
>
>Thanks for the quick reply.
>
>Attached is a .zip file with all of the peaksets (they are not so big
>since they are only from one chromosome) and the sample sheet file.
>
>Below is session info from our cluster as well as from my computer (this
>is a run from the terminal, but I have also run it with the R64 GUI
>console with no problem).
>
>
>
>Session info from HPCC cluster:
>
>> sessionInfo()
>R version 3.0.1 (2013-05-16)
>Platform: x86_64-unknown-linux-gnu (64-bit)
>
>locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C                 LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
>[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>attached base packages:
>[1] parallel  stats     graphics  grDevices utils     datasets  methods
>[8] base
>
>other attached packages:
>[1] DiffBind_1.6.2       Biobase_2.20.0       GenomicRanges_1.12.4
>[4] IRanges_1.18.1       BiocGenerics_0.6.0
>
>loaded via a namespace (and not attached):
>[1] amap_0.8-7         edgeR_3.2.3        gdata_2.12.0.2
>gplots_2.11.0.1
>[5] gtools_2.7.1       limma_3.16.5       RColorBrewer_1.0-5 stats4_3.0.1
>
>[9] zlibbioc_1.6.0
>
>
>
>
>session info from my computer:
>
>> sessionInfo()
>R version 2.15.1 (2012-06-22)
>Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>
>locale:
>[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
>attached base packages:
>[1] stats     graphics  grDevices utils     datasets  methods   base
>
>other attached packages:
>[1] DiffBind_1.4.2       Biobase_2.18.0       GenomicRanges_1.10.7
>[4] IRanges_1.16.6       BiocGenerics_0.4.0
>
>loaded via a namespace (and not attached):
> [1] amap_0.8-7         edgeR_3.0.8        gdata_2.12.0
>gplots_2.11.0
> [5] gtools_2.7.1       limma_3.14.4       parallel_2.15.1
>RColorBrewer_1.0-5
> [9] stats4_2.15.1      zlibbioc_1.4.0
>
>
>
>
>
>
>
>
>
>Thanks,
>Nathan
>
>On Jun 19, 2013, at 11:24 AM, Gordon Brown <Gordon.Brown at cruk.cam.ac.uk>
>wrote:
>
>> Hi, Nathan,
>>
>> I haven't seen messages like these.  The warnings suggest that numbers
>>are
>> being interpreted as strings, and converted to factors, but I can't
>> imagine why that would happen.  Can you send along your sample sheet,
>>and
>> the first few lines of your peaks files?  I'll see if I can reproduce
>>it.
>> Also, can you let me know the sessionInfo() and operating system and
>> version from both your computer and cluster?
>>
>> In case it helps, "dba.count" now has a "bLowMem" option that greatly
>> reduces memory usage in dba.count; perhaps you will be able to get
>>further
>> on your local machine using that option.  You'll have to upgrade to R
>> 3.0.1/Bioconductor 2.12 though.
>>
>> Cheers,
>>
>> - Gord
>>
>> On 2013-06-19 15:23, "Lawson, Nathan" <Nathan.Lawson at umassmed.edu>
>>wrote:
>>
>>>
>>> I am using DiffBind to identify differentially occupied elements from
>>> histone modification ChIP-Seq between two different cell lines.
>>>
>>> I successfully ran the package on my own computer with peaks and mapped
>>> reads limited to a single human chromosome as a test.  The analysis ran
>>> nicely and the results looked good.  Unfortunately, I was not able to
>>>run
>>> a full genome-wide analysis due to computational and space limitations.
>>> Therefore, I tried to run the analysis on our high performance
>>>computing
>>> cluster, which is when the error appeared.
>>>
>>> When running the SAME EXACT set of files, including the same sample
>>> sheet, on our cluster, I get the following error output, as well as
>>> additional warnings that I have not seen previously:
>>>
>>>> AV = dba(sampleSheet="AVbothChr6.csv")
>>> A1.0 HUAEC K27ac artery  1 raw
>>> A1.1 HUAEC K27ac artery  2 raw
>>> V1.0 HUVEC K27ac vein  1 raw
>>> V1.1 HUVEC K27ac vein  2 raw
>>> A1.2 HUAEC p300 artery  1 raw
>>> A1.3 HUAEC p300 artery  2 raw
>>> V1.3 HUVEC p300 vein  1 raw
>>> V2.6 HUVEC p300 vein  2 raw
>>> Error in if (res >= minval) { : missing value where TRUE/FALSE needed
>>> In addition: Warning messages:
>>> 1: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors
>>> 2: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors
>>> 3: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors
>>> 4: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors
>>> 5: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors
>>> 6: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors
>>> 7: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors
>>> 8: In Ops.factor(peaks[, pCol], width) : / not meaningful for factors
>>>>
>>>
>>>
>>> Again, these datasets were successfully entered as a dba object and
>>> subsequently analyzed using DiffBind on my computer.  The original
>>>input
>>> files were simply transferred to the cluster to re-test there.  The
>>>only
>>> difference (I can see) is that the cluster is currently running R-3.0.1
>>> and I was running 2.15.
>>>
>>> I also tried making the peaks.bed files into a 6-column format (the
>>> original bed files only had 5 columns), but this did not seem to solve
>>> the problem.
>>>
>>> Any suggestions are welcome.
>>>
>>> Thanks,
>>> Nathan
>>>
>>> Nathan D. Lawson, Ph.D.
>>> Associate Professor
>>> Program in Gene Function and Expression
>>> University of Massachusetts Medical School
>>> 364 Plantation Street
>>> LRB617
>>> Worcester, MA 01605
>>> website: lawsonlab.umassmed.edu
>>> email: nathan.lawson at umassmed.edu
>>> phone: 508-856-1177
>>>
>>
>
>
>