[BioC] working with large dataframes in R

Elena Sorokin sorokin at wisc.edu
Wed May 25 22:09:21 CEST 2011


Hello, I was recommended to seek out help from this forum. When working 
with large tables of count data (or any other type of data, for that 
matter), R runs out of RAM. Specifically, I'm trying to visualize a 
large data set consisting of count data (55,840 rows by 4 columns) using 
the graphical package ggplot2, and when I try to make a complex 
scatterplot, I get an error message. I've pasted an example code below, 
along with some description of what the data frame is. Any advice about 
how to store this data.frame object in a less memory-intensive way would 
be greatly appreciated. Should I just increase my memory-limit? 
Alternatively, I don't know anything about SQL and relational databases, 
but am willing to learn, if this is really the key to working with large 
objects in R. Sincerely, Elena

 > library(ggplot2)
# I already loaded my data into a data frame object using read.delim
 > summary(df)
      X.val           Y.val       time.value graph.type
  0      :20642   0      :20737   1:55840    D1vD2:27920
  1      : 2139   1      : 2310              U1vU2:27920
  2      : 1162   2      : 1150
  3      :  774   3      :  797
  4      :  607   4      :  572
  5      :  535   5      :  513
  (Other):29981   (Other):29761
 > class(df)
[1] "data.frame"
 > dim(df)
[1] 55840     4
 > qplot(X.val,Y.val, data= df, colour=graph.type)
Error: cannot allocate vector of size 119.2 Mb
In addition: Warning messages:
1: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) :
   Reached total allocation of 1535Mb: see help(memory.size)
2: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) :
   Reached total allocation of 1535Mb: see help(memory.size)
3: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) :
   Reached total allocation of 1535Mb: see help(memory.size)
4: In paste(rep(l, length(lvs)), rep(lvs, each = length(l)), sep = sep) :
   Reached total allocation of 1535Mb: see help(memory.size)

 > sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United 
States.1252

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  
methods   base

other attached packages:
[1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.5.2

loaded via a namespace (and not attached):
[1] tools_2.13.0



More information about the Bioconductor mailing list