[BioC] read gz compressed wig files?

Hamid Bolouri hbolouri at gmail.com
Sat Apr 2 02:43:04 CEST 2011


Thanks for the suggestion Michael.

FYI: doing   import(x, format = "wigLines")    on a ~220MB (unzipped)
ENCODE wig file, R crashed after about an hour with the minimalist
Unix (Ubuntu) message: 'Killed'.  I am guessing a memory limit issue
(which is what I get trying the same command on a Windows PC).

Thanks

Hamid
FYI, sessionInfo for a restarted session:
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] rtracklayer_1.10.6 RCurl_1.4-3        bitops_1.0-4.1

loaded via a namespace (and not attached):
[1] Biobase_2.10.0      Biostrings_2.18.0   BSgenome_1.18.2
[4] GenomicRanges_1.2.1 IRanges_1.8.8       tools_2.12.0
[7] XML_3.2-0


On Fri, Apr 1, 2011 at 2:33 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> Btw, rtracklayer has an internal function, import.wigLines() that can parse
> the lines after the track line. Could try using that.  Can use import(x,
> format = "wigLines") to get there.
>
> On Fri, Apr 1, 2011 at 1:25 PM, Hamid Bolouri <hbolouri at gmail.com> wrote:
>>
>> Martin, Steve; Thank you both much.
>>
>> Pretty amazing that hundreds of ENCODE data files mayt be
>> 'non-standard'. Lesson learnt.
>>
>> Thanks again,
>>
>> Hamid
>>
>> On Fri, Apr 1, 2011 at 12:21 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> > On 04/01/2011 08:29 AM, Steve Lianoglou wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Thu, Mar 31, 2011 at 7:03 PM, Hamid Bolouri<hbolouri at gmail.com>
>> >>  wrote:
>> >>>
>> >>> Thanks Steve;
>> >>>
>> >>>>
>> >>>>
>> >>>> import.wig(gzopen('C:\\...pathto...\\wgEncodeBroadChipSeqSignalK562H3k9me1.wig.gz'))
>> >>>
>> >>> Error in
>> >>>
>> >>> import.wig(gzopen("C:\\Users\\hbolouri\\Desktop\\ENCODE_data\\wgEncodeBroadChipSeqSignalK562H3k9me1.wig.gz"))
>> >>> :
>> >>>  error in evaluating the argument 'con' in selecting a method for
>> >>> function 'import.wig'
>> >>>
>> >>>> traceback()
>> >>>
>> >>> 1:
>> >>>
>> >>> import.wig(gzopen("C:\\...\\wgEncodeBroadChipSeqSignalK562H3k9me1.wig.gz"))
>> >>>
>> >>>
>> >>> using gzfile instead of gzopen avoids the error message, but seems to
>> >>> produce an empty object
>> >>>>
>> >>>>
>> >>>>
>> >>>> import.wig(gzfile('C:\\...pathto...\\wgEncodeBroadChipSeqSignalK562H3k9me1.wig.gz'))
>> >>>
>> >>> RangedDataList of length 0
>> >>
>> >> If you unzip the file and read it in "as normal", does it work
>> >> differently?
>> >
>> > I think the basic problem is that these files are not strictly wiggle,
>> > missing an initial 'track' line:
>> >
>> >> head wgEncodeBroadChipSeqSignalHepg2H3k27ac.wig
>> > fixedStep chrom=chr1 start=1 step=25
>> > 113
>> > 136
>> >
>> > Martin
>> >
>> >>
>> >
>> >
>> > --
>> > Computational Biology
>> > Fred Hutchinson Cancer Research Center
>> > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>> >
>> > Location: M1-B861
>> > Telephone: 206 667-2793
>> >
>>
>>
>>
>> --
>> http://labs.fhcrc.org/bolouri
>
>



-- 
http://labs.fhcrc.org/bolouri



More information about the Bioconductor mailing list