[R] strangely long floating point with write.table()

Mike Miller mbmiller+l at gmail.com
Tue Mar 18 01:43:54 CET 2014


On Mon, 17 Mar 2014, Duncan Murdoch wrote:

> On 14-03-17 6:22 PM, Mike Miller wrote:
>
>> Thanks!  Another thing I've figured out:  Use of "drop0trailing=T" in
>> format() fixes the .00000 stuff that I didn't like:
>> 
>> write.table(format(data[1:10,], digits=5, trim=T, drop0trailing=T), row.names=F, col.names=F, quote=F)
[snip]
>>
>> I still have more to figure out, but for most smaller table-writing 
>> jobs, I think something like the last command above will be my usual 
>> approach. In real life, I would use a tab delimiter, though.
>> 
>> I'm still unsure about the best way for dealing with very large data 
>> frames, though.  There's probably a good way to stream data into a file 
>> so that it doesn't have to be written as an additional large object in 
>> memory.  There must be a way to make a connection and then just pipe 
>> the formatted data into it.  Maybe something related to sprintf() will 
>> work.
>
> You've never explained why you want to write these gigantic text files. 
> Text is a lossy way to store numbers:  it takes 15 bytes to store about 
> 8 bytes of information, and you'll probably lose a few bits at the end. 
> Why not write your files in binary, storing exactly what you have in 
> memory?  It'll be a lot faster to write and to read, you won't need to 
> duplicated before writing, etc.


Thanks for asking, Duncan.  A typical problem is that I am running 12 
processes at once on a 12-core machine with 32 GB of RAM, so each process 
has to be limited to about 2.5 GB total.  Then I try to load as much data 
as I can within that limitation.  The output data does not always need to 
be in text format, but it usually does because it has to be read by other 
programs.

I was hoping I could read a line from a data frame and format it like 
this:

> sprintf(c(rep("%s",2), rep("%d",2), rep("%.4f",4)), data[1,1:8])

But sprintf reads vectors, so they have to be of a single type.

Thanks for your help.

Mike




More information about the R-help mailing list