[R] Fast R implementation of Gini mean difference

Adelchi Azzalini azzalini at stat.unipd.it
Mon Apr 28 10:55:08 CEST 2003


This is to complement my previous contribution on computation of Gini mean  
difference - a discussion started by Andrew Ward. The index is "defined" as
    gini  <- 0
      for (i in 1:n) 
         {
         for (j in 1:n)  gini <- gini + freq[i]*freq[j]*abs(x[i]-x[j])
         }
    gini<- gini/((sum(freq)-1)*sum(freq))

This is  the so-called form "without repetition"; the variant "with repetition"
does not have -1 in the final line.

Since computaation via the definition is totally inefficient, alternative
approaches have been put forward, following Andrew's message.

My first version of a computationally convenient implementation was
essentially this:

gini.md0<- function(x)
 { # x=data vector
   n <-length(x)
   return(4*sum((1:length(x))*sort(x)/(n*(n-1)))
       -2*mean(x)*(n+1)/(n-1))
  }


Since Andrew (private message) has stressed the importance in his problem
of allowing for replicated data, here is a more general version, obtained by 
elaborating on the previous one with a bit of algebra:

gini.md <- function(x, freq=rep(1,length(x)))
{# x=data vector, freq=vector of frequencies
  if(!is.vector(x)) stop("x must be a vector")
  if(length(x) != length(freq)) 
       stop("x and freq must have same length")  
  if(min(freq)<0 | sum(freq)==0 | any(freq != as.integer(freq)) ) 
             stop("freq must be counts")
     x <- x[freq>0]
     freq <- freq[freq>0]
     j <- order(x)
     x <- x[j]
     n <- as.integer(freq[j])
     n. <- sum(n)
     u <- (cumsum(n)-n)*n+ n*(n+1)/2
     return(4*sum(u*x)/(n.*(n.-1))
         -2*weighted.mean(x,n)*(n.+1)/(n.-1))
}
  
Notice that gini.md(x,freq) gives the same of mini.md0(rep(x,freq)), but the latter 
is obviously less efficient. Either are however far more efficient that straight
implementation of the "definition".

regards

Adelchi Azzalini

-- 
Adelchi Azzalini  <azzalini at stat.unipd.it>
Dipart.Scienze Statistiche, Università di Padova, Italia
http://azzalini.stat.unipd.it/



More information about the R-help mailing list