[R] ks.test()

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Aug 28 14:30:02 CEST 2003


You appear to be applying the KS test after estimating parameters.  The 
distribution theory is for an iid sample from a known continuous 
distribution (and does not otherwise depend on the distribution).  Since 
your H_0 is not pre-specified, that distribution theory is not correct.
(Some corrections have been worked out for say ML fitting of exponential 
and normal distributions -- by Michael Stephens as I recall.)

Also, your `truncated LogNormal' does not appear to be truncated, rather
to be shifted.  That's the same thing for an exponential (for a positive
shift), but not for any other distribution.

On Thu, 28 Aug 2003, franck allaire wrote:

> Dear All
> 
> I am trying to replicate a numerical application (not computed on R) from an 
> article. Using, ks.test() I computed the exact D value shown in the article 
> but the p-values I obtain are quite different from the one shown in the 
> article.

And what is the H_0 and H_1 used in the article?

> The tests are performed on a sample of 37 values (please see "[0] DATA" 
> below) for truncated Exponential, Pareto and truncated LogNormal (please see 
> "[1] FUNCTIONS" below). You can find below the commands I use to compute the 
> tests ("[2] COMMANDS") and the p-values presented in the article.
> 
> Can anyone explain the difference between these two set of p-values? Maybe 
> the explanation is linked to my second question: Can anyone confirm me that 
> p-values calculated in the Kolmogrov-Smirnov g-o-f are independent from the 
> family of the distribution assumed in H0?
> 
> Anderson-Darling test is also performed in the article. would anyone have 
> functions to calculate pvalues for exponential, lognormal, weibull, etc?
> 
> Many thanks for your help,
> 
> Franck Allaire
> R 1.6.2
> 
> ###############
> # [0] DATA # truncation point is 30 i.e. all values are above 30
> ###############
> 39.7
> 582
> 439.9
> 47.1
> 83.9
> 41.2
> 893.1
> 192
> 106.2
> 216.7
> 1243.4
> 52.8
> 351.6
> 36.2
> 431.5
> 57.3
> 1602.1
> 822.2
> 260.1
> 58.7
> 6299.9
> 814.9
> 137.2
> 203.8
> 1263.5
> 53.7
> 1313
> 118.4
> 167.8
> 70.1
> 503.7
> 64.8
> 529.9
> 87.8
> 2465.4
> 317.9
> 2753.9
> 
> ###############
> # [1] FUNCTIONS
> ###############
> 
> #EXP
> exptfit_function(x,b, plot.it = F, lty = 1){
> 
>           if (mode(x) != "numeric")
>                 stop("need numeric data")
>           x <- x[!is.na(x)]
>           x <- sort(x)
>           lambda <- length(x)/sum(x-b)
>           if(plot.it) { lines(x, dexpt(x, lambda = lambda, b = b), lty = 
> lty,col="red")}
>           result <- list(lambda = lambda, b = b)
>           class(result) <- "expt"
>           result}
> 
> dexpt<-function(x,lambda,b) dexp(x-b,rate=lambda)
> pexpt<-function(x,lambda,b) pexp(x-b,rate=lambda)
> qexpt<-function(p,lambda,b) qexp(p,rate=lambda)+b
> rexpt<-function(n,lambda,b) rexp(n,rate=lambda)+b
> 
> #PARETO
> paretofit<-function(x,b, plot.it = F, lty = 1){
> 
>           x <- sort(x)
>           alpha <- length(x)/sum(log(x/b))
>           if(plot.it) { lines(x, dpareto(x, alpha = alpha, b = b), lty = 
> lty,col="red")}
>           result <- list(alpha = alpha, b = b)
>           class(result) <- "pareto"
>           result}
> 
> dpareto<-function(x, alpha,b) ifelse(x<b,0,alpha*(b^alpha)/(x^(alpha+1)))
> ppareto<-function(x, alpha,b) ifelse(x<b,0,1-(b/x)^alpha)
> qpareto<-function(p, alpha,b) b*exp(-log(1-p)/alpha)
> rpareto<-function(n, alpha,b) qpareto(runif(n),alpha=alpha,b=b)
> 
> #LN
> lnormtfit_function(x,b, plot.it = F, lty = 1){
> 
>           if (mode(x) != "numeric")
>                 stop("need numeric data")
>           x <- x[!is.na(x)]
>           x <- sort(x)
>           y <- log(x-b)
>           n <- length(y)
>           mu <- mean(y)
>           sigma <- sd(y)
>           if(plot.it) { lines(x, dlnormt(x, meanlog=mu,sdlog=sigma,b=b), lty 
> = lty,col="red")}
>           result <- list(mu=mu,sigma=sigma,b=b)
>           class(result) <- "lnormt"
>           result}
> 
> dlnormt<-function(x,mu,sigma,b) dlnorm(x-b,meanlog=mu,sdlog=sigma)
> plnormt<-function(x,mu,sigma,b) plnorm(x-b,meanlog=mu,sdlog=sigma)
> qlnormt<-function(p,mu,sigma,b) qlnorm(p,meanlog=mu,sdlog=sigma)+b
> rlnormt<-function(n,mu,sigma,b) rlnorm(n,meanlog=mu,sdlog=sigma)+b
> 
> ###############
> # [2] COMMANDS
> ###############
> 
> # EXP
> tmp <- exptfit(data,b=30)
> ks.test(data,"pexpt",lambda=tmp$lambda,b=tmp$b)
> # Article: D= 0.2599 pvalue<<0.005
> 
> # PARETO
> tmp <- paretofit(data,b=30)
> ks.test(data, "ppareto", alpha=tmp$alpha,b=tmp$b)
> # Article: D=0.14586 pvalue=0.16
> 
> # LN
> tmp <- lnormtfit(data,b=30)
> ks.test(data,"plnormt",mu=tmp$mu,sigma=tmp$sigma,b=tmp$b)
> # Article: D=0.07939 pvalue>>0.15
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list