[Rd] Memory allocation in read.table

Hadley Wickham h.wickham at gmail.com
Wed Aug 28 18:17:46 CEST 2013

Hi all,

I've been trying to learn more about memory profiling in R and I've
been trying memory profiling out on read.table. I'm getting a bit of a
strange result, and I hope that someone might be able to explain why.

After running

Rprof("read-table.prof", memory.profiling = TRUE, line.profiling = TRUE,
  gc.profiling = TRUE, interval = interval)
diamonds <- read.table("diamonds.csv", sep = ",", header = TRUE)

and doing an lot of data manipulation, I end up with a table that
displays the total memory (in megabytes) allocated and released (by
gc) from each line of (a local copy of) read.table:

          file line  alloc release
1 read-table.r  122 1.9797  1.1435
2 read-table.r  165 1.1148  0.6511
3 read-table.r  221 0.0763  0.0321
4 read-table.r  222 0.4922  1.5057

Lines 122 and 165 are where I expect to see big allocations and
releases - they're calling scan and convert.type respectively. Lines
221 and 222 are more of a mystery:

    class(data) <- "data.frame"
    attr(data, "row.names") <- row.names

Why do those lines need any allocations? I thought class<- and attr<-
were primitives, and hence would modify in place.

Re-running with gctorture(TRUE) yields roughly similar numbers,
although there is no memory release because gc is called earlier, and
the assignment of allocations to line is probably more accurate given
that gctorture runs the code about 20x slower:

           file line    alloc  release
25 read-table.r  221 0.387299 0.00e+00
26 read-table.r  222 0.362964 0.00e+00

The whole object, when loaded, is ~4 meg, so those allocations
represent fairly sizeable chunks of the total.

Any suggestions would be greatly appreciated.  Thanks!


Chief Scientist, RStudio

More information about the R-devel mailing list