[R] memory management

Mon Oct 30 17:34:48 CET 2006

Hi All,

just a quick (?) question while I wait my code runs...

I'm comparing the identity of the lines of a dataframe, doing all possible 
pairwise comparisons. In doing so I use identical(), but that's by the way. I'm 
doing a (not so) quick and dirty check, and subsetting the data as

data[row.numb,]

and

data[a different row,]

I suspect the problem there is that I load into memory the whole frame data[,] 
every time, making the biz quite slow and wasteful. As I'm idly waiting, I 
though, had I put every line of data[,] as the item of a list, then done my 
pairwise comparisons using the list, would I have had a better performance?

(do I win the prize for the most convoluted sentence sent to the R-help?)

For the pedants, yes, I know I could kill the process and try myself, but the 
spirit of the question is, is there a way of dealing with big data *efficiently*?

Best,

Fede

-- 
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St Mary's Campus
Norfolk Place, London W2 1PG

Tel  +44 (0)20 7594 1602     Fax (+44) 020 7594 3193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com