[R] Bug in truncgof package?

Sun Jun 3 17:46:51 CEST 2012

Dear Carlos, Duncan and everyone

You may have already sorted the matter by now, but since I have not seen
anything posted since Duncan's reply, here I go. I apologize in advance
for the spam, if it turns out I've missed some post.

I think the test and the implementation of the truncgof package are just
fine. I've done Carlos' experiment (repeatedly generating samples and
testing them with AD) and the test never rejected the null hypothesis.

@Carlos: in the first piece of code you are using ad.test (> ad.test(xt,
"plnorm", list(meanlog = 2, sdlog = 2), H = 10)), but in the function
that you are later defining for iteration purposes, you are use the KS
test (>       ks.test(xt, "plnorm", list(meanlog = 2, sdlog = 2), H =
10)$p.value)... This may have something to do with the contradictory
results that you thought you were getting.

Cheers
Andreea

Carlos J. Gil Bellosta wrote:
> Dear R-helpers,
>
> I was testing the truncgof CRAN package, found something that looked
> like a bug, and did my job: contacted the maintainer. But he did not
> reply, so I am resending my query here.
>
> I installed package truncgof and run the example for function ad.test. I
> got the following output:
>
> set.seed(123)
> treshold <- 10
> xc  <- rlnorm(100, 2, 2)    # complete sample
> xt <- xc[xc >= treshold]    # left truncated sample
> ad.test(xt, "plnorm", list(meanlog = 2, sdlog = 2), H = 10)
>
>
>     Supremum Class Anderson-Darling Test
>
> data:  xt
> AD = 3.124, p-value = 0.12
> alternative hypothesis: two.sided
>
> treshold = 10, simulations: 100
>
>
> So I cannot reject the hipothesis (at a standard confidence level) that
> the original sample comes from a lognormal distribution (as it is the
> case).
>
> But let us try to iterate on this example:
>
> set.seed( 123 )
> treshold <- 10
>
> foo <- function(){
>       xc  <- rlnorm(100, 2, 2)     # complete sample
>       xt <- xc[xc >= treshold]     # left truncated sample
>       ks.test(xt, "plnorm", list(meanlog = 2, sdlog = 2), H =
> 10)$p.value
> }
>
> results <- replicate( 100, foo() )
>
>
> Then:
>
>  
>> table( results )
>>    
> results
>    0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09  0.1 0.11 0.16 0.18
> 0.19  0.2
>   25    7    9    3    1    2    3    4    1    1    2    2    1    1
> 3    2
> 0.21 0.22 0.26 0.27 0.28  0.3 0.31 0.32 0.33 0.36 0.38  0.4 0.44 0.49
> 0.54 0.55
>    2    2    1    3    1    2    1    1    1    2    1    2    1    1
> 2    1
> 0.56 0.57 0.62  0.7 0.76 0.78 0.96 0.98
>    1    2    1    1    1    1    1    1
>
>
> This is, in a 45% of the cases, you would reject the H_0 hypothesis,
> which happens to be true, at the 5% "standard" confidence level.
>  

That looks to me that the test as implemented is not very good.  This
could be an implementation bug, but it could also be a limitation of the
test itself.  I don't know the theory underlying this particular test,
but a way to determine it is in implementation bug is to carefully
implement the test and see if you got the same answer.

Duncan Murdoch

> Do you think this behaviour is buggy? If so, given that the maintainer
> does not seem to be contactable, what would be the next step to take?
>
> Best regards,
>
> Carlos J. Gil Bellosta
> http://www.datanalytics.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
--
Andreea Picu
Communication Systems Research Group, ETH Zurich
Web: http://www.csg.ethz.ch/people/apicu
Office: ETZ G 96, Gloriastrasse 35, 8092 Zurich
Phone: +41 44 632 6894  Fax: +41 44 632 1035