[R] Calculating distance matrix for large dataset
David Carlson
dcarlson at tamu.edu
Fri May 3 15:36:23 CEST 2013
Here's the result on R 3.0.0 64 bit under Windows 8:
> A<-matrix(1:365000*144,nrow=365000,ncol=144)
> dim(A)
[1] 365000 144
> d <- dist(mydata_nor, method = "euclidean")
Error in as.matrix(x) : object 'mydata_nor' not found
> d <- dist(A, method = "euclidean")
Error: cannot allocate vector of size 496.3 Gb
In addition: Warning messages:
1: In dist(A, method = "euclidean") :
Reached total allocation of 8078Mb: see help(memory.size)
2: In dist(A, method = "euclidean") :
Reached total allocation of 8078Mb: see help(memory.size)
3: In dist(A, method = "euclidean") :
Reached total allocation of 8078Mb: see help(memory.size)
4: In dist(A, method = "euclidean") :
Reached total allocation of 8078Mb: see help(memory.size)
Your message suggests that your system could not accurately compute the
requirements. Unless you have access to a computer with 500 gigabytes, you
need to consider alternate approaches such as aggregating the data into
longer time blocks or using kmeans.
-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of HJ YAN
Sent: Thursday, May 2, 2013 6:02 PM
To: r-help at r-project.org
Subject: [R] Calculating distance matrix for large dataset
Dear R users
I wondered if any of you ever tried to calculate distance matrix with very
large data set, and if anyone out there can confirm this error message I got
actually mean that my data is too large for this task.
negative length vectors are not allowed
My data size and code used
dim(mydata_nor)[1] 365000 144> d <- dist(mydata_nor, method =
"euclidean")
Here my data has 1000 samples each has a year data observed by 10 minutes
interval daily, so the size is (365* 1000) * 144.
I checked the manual of function 'dist' but can not see the upper limit size
allowed, and I bet there should be one, so any hints is appreciated.
I would also be grateful if any other method for calculating distance matrix
for large dataset could be advised.
I appreciate reproducible code should be provided for your advice, so try
below if needed:
A<-matrix(1:365000*144,nrow=365000,ncol=144)> dim(A)[1] 365000 144>
d1<-dist(A,method="euclidean")Error in dist(A, method = "euclidean") :
negative length vectors are not allowed
Many thanks in advance!
HJ
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list