[R] millions of comparisons, speed wanted

Sat Dec 17 14:57:09 CET 2005

The daisy function is _very_ good!
I have been able to use it for nominal variables as well, simply by:
daisy(input)*ncol(input)

Now, for very large number of rows (say 5000), daisy works for about 3
minutes using the swap space. I probably need more RAM (only 512 on my
computer). But at least I get a result... :)

For relatively small input matrices, it increased the speed by a
factor of 3. Way to go!

Best,
Adrian

On 12/16/05, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
> I have not taken the time to look into this example,
> but
>        daisy()
> from the (recommended, hence part of R) package 'cluster'
> is more flexible than dist(), particularly in the case of NAs
> and for (a mixture of continuous and) categorical variables.
>
> It uses a version of Gower's formula in order to deal with NAs
> and asymmetric binary variables.  The example below look like
> very well matching to this problem.
>
> Regards,
> Martin Maechler, ETH Zurich