[R] Rank-based p-value on large dataset

Sean Davis sdavis2 at mail.nih.gov
Thu Mar 3 23:49:55 CET 2005


The x's and y's are different sets--210,000 values altogether.  That is  
really the issue--they can't just be sorted, at least that I can  
see....

Sean

On Mar 3, 2005, at 5:38 PM, Huntsinger, Reid wrote:

> When you say the 130,000 points are from the empirical distribution,  
> how did
> you get them? Is each one really one of the values of y? If you sorted  
> y
> first, would you know which one (ie which index) each x is? (Sorting  
> 80,000
> elements took essentially no time at all on my sub-gigahertz Pentium  
> III.)
> But maybe that's not an option... more details would help.
>
> Reid Huntsinger
>
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Sean Davis
> Sent: Thursday, March 03, 2005 5:22 PM
> To: r-help
> Subject: [R] Rank-based p-value on large dataset
>
>
> I have a fairly simple problem--I have about 80,000 values (call them
> y) that I am using as an empirical distribution and I want to find the
> p-value (never mind the multiple testing issues here, for the time
> being) of 130,000 points (call them x) from the empirical distribution.
>   I typically do that (for one-sided test) something like
>
> loop over i in x
> p.val[i] = sum(y>x[i])/length(y)
>
> and repeat for all i.  However, length(x) is large here as is
> length(y), so this process takes quite a long time.  Any suggestions?
>
> Thanks,
> Sean
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
>
>
>
> ----------------------------------------------------------------------- 
> -------
> Notice:  This e-mail message, together with any attachment...{{dropped}}




More information about the R-help mailing list