[R] memory problem [cluster]

Roger Bivand Roger.Bivand at nhh.no
Sat Dec 2 22:11:12 CET 2006


On Sat, 2 Dec 2006, Dylan Beaudette wrote:

> Hi Stephano,

Looks like you used my example verbatim 
(http://casoilresource.lawr.ucdavis.edu/drupal/node/221)

:)

>From exchanges on R-sig-geo, I believe the original questioner is feeding
NAs to clara, and the error message in clara() is overrunning the buffer
in sprintf(), so the memory problem isn't correctly identified. Using
scripts out of context without checking whether the input data frame 
satifies the conditions of the functions being used is asking for trouble. 
The error message:

 > traceback()
2: stop(ngettext(length(i), sprintf("Observation %d has", i[1]),
        sprintf("Observations %s have", paste(i, collapse = ","))),
        " *only* NAs --> omit for clustering")
1: clara(morph, k = 5, stand = F)

is coming from lines:

                i[1]), sprintf("Observations %s have", paste(i, 
                collapse = ","))), " *only* NAs --> omit for clustering")

in clara(). I have suggested dropping those rows from the data frame in a 
reply on R-sig-geo, but maybe clara() could be patched to count the # of 
completely missing rows, and if # is more than a modest number, not print 
the obs. numbers, just the total?

Roger


While my approach has not *yet* been published, the original source [4] by 
Roger Bivand certainly has. Just a reminder.

That said, I would highly recommend reading up on the background literature 
assocated with both the cluster package [1] and terrain classificartion i.e.
[2] and [3]. Note that although the clara() function was created to work on 
massive datasets, it is still possible to overwhelm the available memory with 
multiple gridded objects- recall that all R objects are held in memory.

I have asked the maintainer of the cluster package, Martin Maechler, about 
integrating a known medoid option into the clara() function- which would be 
extremely useful in adding some 'supervision' to landscape classification 
with clara(). Hopefully there will be enough requests for the feature, that 
Martin will kindly add it :) .

1. Kaufman, L. & Rousseeuw, P.J. Finding Groups in Data An Introduction to 
Cluster Analysis Wiley-Interscience, 2005

2. Blaszczynski, J. Landform characterization with geographical information 
systems Photogrammetric Engineering and Remote Sensing, 1997, 63, 183-191

3. Wood, W.F. & Snell, J.B. A Quatitative system for classifying landforms 
U.S. Quatermaster Research & Engineering Center, 1960

4. Bivand, R. Integrating GRASS 5.0 and R: GIS and modern statistics Computers 
& Geosciences, 2000, 26, 1043–1052


On Friday 01 December 2006 14:04, Massimo Di Stefano wrote:
> hi to all,
> frustated for this error, to day i buy a 1 GB memory
> slot for my laptop
> now it have 1,28GB instead the old 512, but i've the
> same error :-(
> damn!damn!....how can i do?
> repeat for a little area (about 20X20 km and res=20m)
> it work fine!
> have you any suggestion?
> is ther a method for look if this error depend from my
> ram or other....?
> thanks foe any suggestion!
> i need your help.
> thanks.
> Massimo
>
>
> Il giorno 01/dic/06, alle ore 16:05, massimodisasha ha
> scritto:
> hi,
> i'm trying to perform a clustering on a big dataframe
> the code is this:
>
>
> print("load required R packages")
>
> require(spgrass6)
>
> require(cluster)
>
> gmeta6 <- gmeta6()
>
> print("read in our 7 raster files from GRASS")
>
> x <-
> readFLOAT6sp(c("er","crosc","longc","slope","profc","minic","maxic"))
>
> print("assemble a matrix of our terrain variables")
>
> morph <- data.frame(cbind(x$er, x$crosc, x$longc,
> x$slope, x$profc, x$minic, x$maxic))
>
> print("normailize slope by dividing my max(slope)")
>
> morph <- data.frame(cbind(x$er, x$crosc, x$longc,
> x$slope/max(x$slope), x$profc, x$minic, x$maxic))
>
> names(morph) <-
> c("er","crosc","longc","slope_n","profc","minic","maxic")
>
> print("perform the clustering")
>
> morph.clara <- clara(morph, k=5, stand=F)
>
> x$morph_class <- morph.clara$clustering
>
> print("send result back to GRASS")
>
> rast.put6(x,"morph", zcol="morph_class")
>
>
>
> during the step : ....perform the clustering
> after a lot of time,
> i've this error:
>
>
>
>
> Errore in sprintf(fmt, ...) : La lunghezza della
> stringa eccede la dimensione del buffer di 8192
> Inoltre: Warning messages:
> 1: perl = TRUE è implementato solo nei locale UTF-8
> 2: perl = TRUE è implementato solo nei locale UTF-8
> 3: perl = TRUE è implementato solo nei locale UTF-8
> 4: perl = TRUE è implementato solo nei locale UTF-8
> 5: perl = TRUE è implementato solo nei locale UTF-8
> 6: perl = TRUE è implementato solo nei locale UTF-8
> 7: perl = TRUE è implementato solo nei locale UTF-8
> 8: La stringa di caratteri verrà probabilmente
> troncata
> Esecuzione interrotta
>
>
>
> if i try the same code on a subregion of my data, it
> works very fine!
> but for a large region i've this error :-(
>
> obviously i think that is a memory problem, right ?
> (i'm working with a notebook PPC-1.33-512ram)
> my data are  : 7 raster-map on a region of about 50X40
> km at a resolution of 20m.
> is there some wolkaround about the memory problems?
>
> an other question is:
> what is this :
> Warning messages:
> 1: perl = TRUE è implementato solo nei locale UTF-8
> 2: perl = TRUE è implementato solo nei locale UTF-8
> 3: perl = TRUE è implementato solo nei locale UTF-8
> 4: perl = TRUE è implementato solo nei locale UTF-8
> 5: perl = TRUE è implementato solo nei locale UTF-8
> 6: perl = TRUE è implementato solo nei locale UTF-8
> 7: perl = TRUE è implementato solo nei locale UTF-8
>
> is it about this line of the code :
>
> morph.clara <- clara(morph, k=5, stand=F)
> i have an F > false
>
>
> thanks for any suggestion about,
>
> Massimo
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.

-- 
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-help mailing list