[R] Rank-based p-value on large dataset

Thu Mar 3 23:38:50 CET 2005

When you say the 130,000 points are from the empirical distribution, how did
you get them? Is each one really one of the values of y? If you sorted y
first, would you know which one (ie which index) each x is? (Sorting 80,000
elements took essentially no time at all on my sub-gigahertz Pentium III.)
But maybe that's not an option... more details would help.

Reid Huntsinger

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Sean Davis
Sent: Thursday, March 03, 2005 5:22 PM
To: r-help
Subject: [R] Rank-based p-value on large dataset

I have a fairly simple problem--I have about 80,000 values (call them 
y) that I am using as an empirical distribution and I want to find the 
p-value (never mind the multiple testing issues here, for the time 
being) of 130,000 points (call them x) from the empirical distribution. 
  I typically do that (for one-sided test) something like

loop over i in x
p.val[i] = sum(y>x[i])/length(y)

and repeat for all i.  However, length(x) is large here as is 
length(y), so this process takes quite a long time.  Any suggestions?

Thanks,
Sean

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html