# [R] Calculating distance matrix for large dataset

David Carlson dcarlson at tamu.edu
Fri May 3 15:36:23 CEST 2013

```Here's the result on R 3.0.0 64 bit under Windows 8:

> A<-matrix(1:365000*144,nrow=365000,ncol=144)
> dim(A)
 365000    144
> d <- dist(mydata_nor, method = "euclidean")
> d <- dist(A, method = "euclidean")
Error: cannot allocate vector of size 496.3 Gb
1: In dist(A, method = "euclidean") :
Reached total allocation of 8078Mb: see help(memory.size)
2: In dist(A, method = "euclidean") :
Reached total allocation of 8078Mb: see help(memory.size)
3: In dist(A, method = "euclidean") :
Reached total allocation of 8078Mb: see help(memory.size)
4: In dist(A, method = "euclidean") :
Reached total allocation of 8078Mb: see help(memory.size)

requirements. Unless you have access to a computer with 500 gigabytes, you
need to consider alternate approaches such as aggregating the data into
longer time blocks or using kmeans.

-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of HJ YAN
Sent: Thursday, May 2, 2013 6:02 PM
To: r-help at r-project.org
Subject: [R] Calculating distance matrix for large dataset

Dear R users

I wondered if any of you ever tried to calculate distance matrix with very
large data set, and if anyone out there can confirm this error message I got
actually mean that my data is too large for this task.

negative length vectors are not allowed

My data size and code used

dim(mydata_nor) 365000    144> d <- dist(mydata_nor, method =
"euclidean")

Here my data has 1000 samples each has a year data observed by 10 minutes
interval daily, so the size is  (365* 1000) * 144.

I checked the manual of function 'dist' but can not see the upper limit size
allowed, and I bet there should be one, so any hints is appreciated.

I would also be grateful if any other method for calculating distance matrix
for large dataset could be advised.

I appreciate reproducible code should be provided for your advice, so try
below if needed:

A<-matrix(1:365000*144,nrow=365000,ncol=144)> dim(A) 365000    144>
d1<-dist(A,method="euclidean")Error in dist(A, method = "euclidean") :
negative length vectors are not allowed

HJ

[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help