FW: [R] Optimising code

Adaikalavan RAMASAMY ramasamya at gis.a-star.edu.sg
Tue Oct 7 13:46:51 CEST 2003


If you are only interested in calculating the wilcoxon statistic (as one
would with calculating permutated p-values for example), the following
should suffice.

length.na <- function(x, ...){
  tmp <- !(is.na(x) | is.infinite(x))
  length(x[tmp], ...)
}

rank.na <- function(x){
  y <- x[!is.na(x)]; 
  x[!is.na(x)] <- rank(y);
  return(x)
}

wstats.fn <- function(data=NULL, g1=NULL, g2=NULL){
  temp <- t( apply(data, 1, rank.na));
  n1 <- apply(temp[ ,g1], 1, length.na);
  n2 <- apply(temp[ ,g2], 1, length.na);

  wstats <- rowSums( temp[,g1], na.rm=TRUE ) - n1*(n2 + 1)/2;
  return(wstats);
}

m <- matrix( rnorm( 2500*100 ), ncol=100 )
w1 <- wstats.fn(data=m, g1=1:50, g2=51:100)
w2 <- apply(m, 1, function(x) wilcox.test( x[1:50], x[51:100] )$stat)

On my machine, it take 2.2s to calculate w1 compared to 15.6s for w2.
rank.na and length.na are not necessary if your data does not have
missing values. 

Since you are doing large enough times it might be wise to code in C. My
C version of this is at least 20-40 times faster than the one in R but
like the one above it does not return p-values.


--
Adaikalavan Ramasamy 


-----Original Message-----
From: Uwe Ligges [mailto:ligges at statistik.uni-dortmund.de] 
Sent: Tuesday, October 07, 2003 6:18 PM
To: Crispin Miller
Cc: R-help (E-mail)
Subject: Re: FW: [R] Optimising code


Crispin Miller wrote:

>>>I have a function that applies a wilcoxon test to 12 sets of about a 
>>>quarter of a million pairs
> 
> 
>>... and let me guess: everything is significiant to an almost 
>>arbitrary
>>value of \alpha?
> 
> 
> :-) For each of quarter of a million sets, I do a wilcoxon between two

> pairs each containing twenty numbers... I do this 12 times...

Ah. Sorry for misunderstanding.


>>>(and takes about 3 hours). I've replaced the inner loop I
>>
>>had originally with a function call via mapply, and also
>>considered different approximations  of the wilcoxon, rather 
>>than that which is implemented in wilcox.test, but that makes 
>>little difference 9and if anything slows things down :-).
>>
>>Are you using wilcox.test(), or a self written function?
>>
> 
> 
> I started off with wilcox.test, and tried to speed it up by hacking 
> out as much from the original function as possible, since a fair 
> proportion of the code being executed was logic deciding between the 
> different flavours of the test (I didn't expect that to make much 
> difference, but tried it anyway), there was also a call to pnorm() 
> which I replaced with a numeric approximation. This is not inlined 
> into the original function.


In that case you cannot do very much except for writing it in a language

like C / Fortran, if you really need the speed.

Uwe Ligges

> Crispin
>  
> --------------------------------------------------------
> 
>  
> This email is confidential and intended solely for the use 
> o...{{dropped}}
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help




More information about the R-help mailing list