[R] merging pre-sorted data frames
mbmiller+l at gmail.com
Wed Jan 14 01:55:00 CET 2015
I have many pairs of data frames each with about 15 million records each
and about 10 million records in common. They are sorted by two of their
fields and will be merged by those same fields.
The fact that the data are sorted could be used to greatly speed up a
merge, but I have the impression that merge() cannot "know" in advance
that the fields are already sorted.
I'm sure that I can use merge(), but I suspect that it is doing a lot of
unnecessary work and that it will take much more time than the job really
should require. Is that correct? Can anything be done about it?
The inspiration for my question comes partly from the way GNU comm works.
If you have any ideas about this, I'd love to hear them.
Thanks in advance.
Michael B. Miller, Ph.D.
University of Minnesota
More information about the R-help