[R] Faster Printing Alternatives to 'cat'

gundalav gundalav at gmail.com
Sat Jan 17 14:59:43 CET 2009


Dear Jim and all,

Allow me to ask your expert opinion.


Using the data (16Mb) downloadable from here:

http://drop.io/gundalav/asset/test-data-zip


It took this long under 1994.070Mhz Cpu Linux, using
"write.table"

> proc.time() - ptm1
     user    system   elapsed
16581.833  5787.228 21386.064



__MYCODE__

args <- commandArgs(trailingOnly=FALSE)
fname <- args[3]
dat <- read.delim(fname, header=FALSE);

output <- file('output_writetable.txt', 'w')


ptm1 <- proc.time()
for (i in 1:nrow(dat)) {

     #cat(dat$V1[i]," ", as.character(dat$V2[i]),"\n", sep="")
     write.table(cbind(dat$V1[i], as.character(dat$V2[i])),
file=output, sep="\t", quote=FALSE, col.names=FALSE, row.names=FALSE)
}

close(output)
proc.time() - ptm1
__END__

Perhaps I misunderstood you. But seems that this is
truly slow. Is there a way I can speed it up?


On Jan 8, 11:11 pm, "jim holtman" <jholt... at gmail.com> wrote:
> Here is one way of doing it.  To write out 1 million rows on my system
> took 21 seconds.
>
> > # create some data
> > dataSize <- 1e6
> > foo <- runif(dataSize)
> > bar <- runif(dataSize)
> > n <- 1000  # number of items to write out each time
> > output <- file('/output.txt', 'w')
> > # now split the indices into groups of 'n'
> > index <- split(seq(length(foo)), cut(seq(length(foo)), length(foo) / n, labels=FALSE))
> > my.stats(reset=TRUE)
>
> stats (1) - Rgui : <0.0 0.0> 73738.9 : 185.1MB> for (i in index){
>
> +     write.table(cbind(foo[i], bar[i]), file=output, sep='\t',
> col.names=FALSE, row.names=FALSE)
> + }> close(output)
> > my.stats('done')
>
> done (1) - Rgui : <20.7 20.7> 73759.6 : 124.6MB
>
>
>
>
>
> On Thu, Jan 8, 2009 at 8:26 AM,GundalaViswanath <gunda... at gmail.com> wrote:
> > Dear Jim and Henrik,
>
> >> What exactly is the problem you are trying to solve.
> >> Is it going to be read by some other program?
>
> > I  simply want to print the data out. Surely, this data
> > will be manipulated (with Excel or other
> > programming languages) by other people suit to their purpose.
>
> > Typically the print out from the loop looks  like this:
>
> > ATCGATCGATCGGGGGGGGGGGGGGGTTTGCGGG   10   11.992
> > CCCCCCCCGGGCCATCGGTCAGGGAATTGACGGAA   2      0.222
> > .....
> > up to ~16 million lines.
>
> >> How much physical memory do you have on your machine?
> > 6GB
>
> >>  Is there paging  occuring due to the size of the objects?
> > Don't quite understand what do you mean by that
> > So sorry for my lack of knowledge in R.
>
> >>  Have you consider creating a  structure with 10,000 of the variables
> >> each time through the loop and then writing them out?
>
> > Never thought about that. Can you be specific how can this be achieved?
>
> > -GundalaViswanath
> > Jakarta - Indonesia
>
> > On Thu, Jan 8, 2009 at 10:10 PM, jim holtman <jholt... at gmail.com> wrote:
> >> What exactly is the problem you are trying to solve.  What is going to
> >> be done with the data?  Is it going to be read by some other program?
> >> How much physical memory do you have on your machine?  Is there paging
> >> occuring due to the size of the objects?  Have you consider creating a
> >> structure with 10,000 of the variables each time through the loop and
> >> then writing them out?  A lot will depend on how much free memory you
> >> have.  I will also ask one of my favorite questions; "tell me what you
> >> want to do, not how you want to do it".
>
> >> On Thu, Jan 8, 2009 at 6:12 AM,GundalaViswanath <gunda... at gmail.com> wrote:
> >>> Dear all,
>
> >>> I found that printing with 'cat' is very slow.
>
> >>> For example in my machine this snippet
>
> >>> __BEGIN__
>
> >>> # I need to resolve to use this type of loop.
> >>> # because using write(), I need to create a matrix  which
> >>> # consumes so much memory. Note that "foo, bar, qux" object
> >>> # is already very large (>2Gb)
>
> >>> for ( s in 1:length(x) ) {
> >>>    cat(as.character(foo[s]),"\t",bar[s],"\t", qux[s],"\n")
> >>> }
> >>> __END__
>
> >>> for "x" of size ~1.5million, takes more than 10 hours to print.
> >>> On my Linux 1994.MHz AMD processor.
>
> >>> Is there any faster alternatives to "cat" ?
>
> >>> -GundalaViswanath
> >>> Jakarta - Indonesia
>
> >>> ______________________________________________
> >>> R-h... at r-project.org mailing list
> >>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
>
> >> --
> >> Jim Holtman
> >> Cincinnati, OH
> >> +1 513 646 9390
>
> >> What is the problem that you are trying to solve?
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list