[R] Some code to run Hutcheson t-test using R

Thu Sep 10 13:04:17 CEST 2020

Hello,

Sorry, there's an instruction missing. See inline.

Às 11:44 de 10/09/20, Rui Barradas escreveu:
> If you want a function automating Karl's code, here it is. It returns an 
> object of S3 class "htest", R's standard for hypothesis tests functions. 
> The returned object can then be subset in the usual ways, ht$statistic, 
> ht$parameter, ht$p.value, etc.
> 
> 
> library(QSutils)
> 
> hutcheson.test <- function(x1, x2){
>    dataname1 <- deparse(substitute(x1))
>    dataname2 <- deparse(substitute(x2))
>    method <- "Hutcheson's t-test for Shannon diversity equality"
>    alternative <- "the diversities of the two samples are not equal"
>    h1 <- Shannon(x1)
>    varh1 <- ShannonVar(x1)
>    n1 <- sum(x1)
>    h2 <- Shannon(x2)
>    varh2 <- ShannonVar(x2)
>    n2 <- sum(x2)
>    degfree <- (varh1 + varh2)**2 / (varh1**2/n1 + varh2**2/n2)
>    tstat <- (h1 - h2)/sqrt(varh1 + varh2)
>    p.value <- 2*pt(-abs(tstat), df = degfree)
>    ht <- list(
>      statistic = c(t = tstat),
>      parameter = c(df = degfree),
>      p.value = p.value,
>      alternative = alternative,
>      method = method,
>      data.name = paste(dataname1, dataname2, sep = ", ")
>    )
>    class(ht) <- "htest"
>    ht
> }
> 
> earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
> later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
> 
> hutcheson.test(earlier, later)
> 
> 
> 
> With the data you provided:
> 
> 
> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
> bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
> bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
> bird_1959 <- c(0,0,14,59,26,68,0)
> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
> 
> hutcheson.test(bird_1956, bird_1957)
> 
> 
> 
> 
> Note that like David said earlier, there might be better ways to 
> interpret Shannon's diversity index. If h is the sample's diversity, 
> exp(h) gives the number of equally-common species with equivalent 
> diversity.
> 
> 
> s1 <- Shannon(earlier)
> s2 <- Shannon(later)
> c(earlier = s1, later = s2)
> exp(c(earlier = s1, later = s2))   # Both round to 3
> eq_common <- rep(1, 3)             # Can be 1 specimen or any other number
> Shannon(eq_common)                 # Slightly greater than the samples' 
> diversity
> 
>

# Create a list with all the data
birds <- mget(ls(pattern = "^bird"))

> round(exp(sapply(birds, Shannon))) # Your data

Hope this helps,

Rui Barradas

> 
> 
> #-------------------------------------
> 
> 
> Earlier Karl wrote [1] that
> 
> 
> Here var(h) is calculated as in ref 1 cited by Rui Barradas - I guess
> that explains the minor numerical differences obtained with the code
> above and the published variances.
> 
> 
> I don't believe the published variances were computed with the published 
> variance estimator. The code below computes the variances like QSutils 
> and with formula (4) in Hutcheson's paper. The latter does not give the 
> same results.
> 
> var_est <- function(n){
>    s <- length(n)
>    N <- sum(n)
>    p <- n/N
>    i <- p != 0
>    inv.p <- numeric(s)
>    inv.p[i] <- 1/p[i]
>    log.p <- numeric(s)
>    log.p[i] <- log(p[i])
>    #
>    term1 <- (sum(p * log.p^2) - sum(p * log.p)^2)/N
>    term2 <- (s - 1)/(2*N^2)
>    #
>    numer3 <- -1 + sum(inv.p) - sum(inv.p * log.p) + sum(inv.p)*sum(p * 
> log.p)
>    denom3 <- 6*N^3
>    term3 <- numer3/denom3
>    list(
>      Bioc = term1 + term2,
>      Hutch = term1 + term2 + term3
>    )
> }
> 
> Vh1 <- var_est(earlier)
> Vh1
> all.equal(ShannonVar(earlier), Vh1$Bioc)
> ShannonVar(earlier) - Vh1$Bioc            # FAQ 7.31
> 
> Vh2 <- var_est(later)
> Vh2
> identical(ShannonVar(later), Vh2$Bioc)    # TRUE
> 
> 
> 
> [1] https://stat.ethz.ch/pipermail/r-help/2020-September/468664.html
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
> 
> Às 09:38 de 10/09/20, Luigi Marongiu escreveu:
>> Update:
>> I can see that you used the function Shannon from the package QSutils.
>> This would supplement the iNext package I used and solve the problem.
>> Thank you.
>>
>> On Thu, Sep 10, 2020 at 10:35 AM Luigi Marongiu
>> <marongiu.luigi using gmail.com> wrote:
>>>
>>> Thank you very much for the code, that was very helpful.
>>> I got the article by Hutcheson -- I don't know if I can distribute it
>>> , given the possible copyrights, or if I can attach it here -- but it
>>> does not report numbers directly: it refers to a previous article
>>> counting bird death on a telegraph each year. The numbers
>>> are:
>>> bird_1956 <- c(4,4,190,135,56,3,2,2,1,12,41,201,1,0,131,3)
>>> bird_1957 <- c(4,111,53,66,146,222,126,61,0,2323,21)
>>> bird_1958 <- c(0,3,32,228,56,102,0,11,2,220,0)
>>> bird_1959 <- c(0,0,14,59,26,68,0)
>>> bird_1960 <- c(0,0,73,66,71,25,0,109,63,1)
>>>
>>> This for sake of the argument.
>>> As for my problem, I implemented the Shannon index with the package
>>> iNext, which only gives me the index itself and the 95% CI. Even when
>>> I implemented it with vegan, I only got the index. Essentially I don't
>>> have a count of species I could feed into the Hutcheson's. Is there a
>>> way to extract these data? Or to run a Hutcheson's on the final index?
>>> Thank you
>>>
>>> On Tue, Sep 8, 2020 at 7:43 PM Karl Schilling
>>> <karl.schilling using uni-bonn.de> wrote:
>>>>
>>>> Dear Luigi,
>>>>
>>>> below some code I cobbled together based on the Hutcheson paper you
>>>> mentioned. I was lucky to find code to calculate h and, importantly, 
>>>> its
>>>> variance in the R-package QSutils - you may find it on the Bioconductor
>>>> website.
>>>>
>>>> here is the code, along with an example. I also attach the code as an
>>>> R-file.
>>>>
>>>> Hope that helps.
>>>> All my best
>>>>
>>>> Karl
>>>> PS don't forget to adjust for multiple testing if you compare more than
>>>> two groups.
>>>> K
>>>>
>>>>
>>>> # load package needed
>>>> # QSutils is on Bioconductor
>>>> library(QSutils)
>>>>
>>>> # here some exemplary data - these are the data from Pilou 1966 that 
>>>> are
>>>> used
>>>> # in the second example of Hutcheson, J theor Biol 129:151-154 (1970)
>>>>
>>>> earlier <- c(0,0,146,0,5,46,0,1,295,0,0,3,0,0,0,0,0)
>>>> later <- c(0,0,142,0,5,46,0,1,246,0,0,3,0,0,0,0,0)
>>>> # numbers of the first example used by Hutcheson were unfortunately not
>>>> # available to me
>>>>
>>>> # here starts the code ; you may replace the variables "earlier" and 
>>>> "later"
>>>> # by your own numbers.
>>>>
>>>> # calculate h, var(h) etc
>>>> h1 <- Shannon(earlier)
>>>> varh1 <- ShannonVar(earlier)
>>>> n1 <- sum (earlier)
>>>> h2 <- Shannon(later)
>>>> varh2 <- ShannonVar(later)
>>>> n2 <- sum (later)
>>>> degfree <- (varh1 + varh2)**2 /(varh1**2/n1 + varh2**2 /n2)
>>>>
>>>> # compare numbers with those in the paper
>>>> h1
>>>> h2
>>>> varh1
>>>> varh2
>>>> # I assume that minor numerical differences are due to differences 
>>>> in the
>>>> # numerical precision of computers in the early seventies and today 
>>>> / KS
>>>>
>>>> # this is the actual t-test
>>>> t <- (h1-h2) /sqrt(varh1 + varh2)
>>>> p <- 2*pt(-abs(t),df= degfree)
>>>> p
>>>>
>>>> # that's it
>>>> # Best
>>>> # Karl
>>>> -- 
>>>> Karl Schilling, MD
>>>> Professor of Anatomy and Cell Biology
>>>> Anatomisches Institut
>>>> Rheinische Friedrich-Wilhelms-Universität
>>>> Nussallee 10
>>>>
>>>> D-53115 Bonn
>>>> Germany
>>>>
>>>> phone ++49-228-73-2602
>>>>
>>>
>>>
>>> -- 
>>> Best regards,
>>> Luigi
>>
>>
>>
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.