[R] bootstrap CI of the difference between 2 Cramer's V

varin sacha v@r|n@@ch@ @end|ng |rom y@hoo@|r
Wed Jun 8 20:35:46 CEST 2022


Dear Rui, 
Dear Daniel,

Many thanks for your code. It perfectly works.
Thanks for your answer.

Best,
S.

Envoyé de mon iPhone

> Le 6 juin 2022 à 21:17, Rui Barradas <ruipbarradas using sapo.pt> a écrit :
> 
> Hello,
> 
> Here is your code, corrected. It uses a Goodman-Kruskal gamma function in package DescTools.
> Function G probably doesn't need tryCatch but I had errors in some test runs.
> 
> 
> library(boot)
> 
> G <- function(x, index, cols = 1:2) {
>  y <- x[index, cols]    # bootstrapped data
>  g <- x$group[index]    # and groups
>  # calculate gamma for each group bootstrap sample
>  # (trap errors in case one of the groups is empty)
>  g <- tryCatch(
>    lapply(split(y[1:2], g), \(x) {
>      tbl <- table(x)
>      DescTools::GoodmanKruskalGamma(tbl)
>    }),
>    error = function(e) list(NA_real_, NA_real_)
>  )
>  # calculate difference
>  g[[1]] - g[[2]]
> }
> 
> set.seed(2022)
> # use strata parameter in function boot to resample within each group
> results <- boot(
>  data = f3,
>  statistic = G,
>  R = 2000,
>  strata = as.factor(f3$group),
>  cols = 1:2
> )
> 
> results
> boot.ci(results)
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
> 
> Às 17:21 de 05/06/2022, varin sacha via R-help escreveu:
>> Dear Daniel,
>> Dear R-experts,
>> I really thank you a lot Daniel. Nobody had answered to me offline. So, thanks.
>> I have tried in the same vein for the Goodman-Kruskal gamma for ordinal data. There is an error message at the end of the code. Thanks for your help.
>> ##############################
>> library(ryouready)
>> library(boot)
>> shopping1<-c("très important","important","pas important","pas important","important","très important","important","pas important","très important","très important","important","pas important","pas important","important","très important","très important","important","pas important","pas important","important","très important","très important","important","pas important","pas important","important","très important","très important","important","pas important","pas important","important","très important","très important","important","pas important","pas important","important","très important","important")
>> statut1<-c("riche","pas riche","moyennement riche","moyennement riche","riche","pas riche","moyennement riche","moyennement riche","riche","pas riche","moyennement riche","riche","pas riche","pas riche","riche","moyennement riche","riche","pas riche","pas riche","pas riche","riche","riche","moyennement riche","riche","riche","moyennement riche","moyennement riche","moyennement riche","pas riche","pas riche","riche","pas riche","riche","pas riche","riche","moyennement riche","riche","pas riche","moyennement riche","riche")
>> shopping2<-c("important","pas important","très important","très important","important","très important","pas important","important","pas important","très important","important","important","important","important","pas important","très important","très important","important","pas important","très important","pas important","très important","pas important","très important","important","très important","important","pas important","pas important","important","pas important","très important","pas important","pas important","important","important","très important","très important","pas important","pas important")
>> statut2<-c("moyennement riche","pas riche","riche","moyennement riche","moyennement riche","moyennement riche","pas riche","riche","riche","pas riche","moyennement riche","riche","riche","riche","riche","riche","pas riche","moyennement riche","moyennement riche","pas riche","moyennement riche","pas riche","pas riche","pas riche","moyennement riche","riche","moyennement riche","riche","pas riche","riche","moyennement riche","blue","moyennement riche","pas riche","pas riche","riche","riche","pas riche","pas riche","pas riche")
>> f1 <- data.frame(shopping=shopping1,statut=statut1,group='grp1')
>> f2 <- data.frame(shopping=shopping2,statut=statut2,group='grp2')
>> f3 <- rbind(f1,f2)
>> G <- function(x, index) {
>>    # calculate goodman for group 1 bootstrap sample
>>    g1 <-x[index,][x[,3]=='grp1',]
>>    goodman_g1 <- cor(data[index,][1,2])
>>     # calculate goodman for group 2 bootstrap sample
>>    g2 <-x[index,][x[,3]=='grp2',]
>>    goodman_g2 <- cor(data[index,][3,4])
>>     # calculate difference
>>    goodman_g1-goodman_g2
>>    }
>>  # use strata parameter in function boot to resample within each group
>> results <- boot(data=f3,statistic=G, strata=as.factor(f3$group),R=2000)
>> results
>> boot.ci(results)
>> ##############################
>> Le samedi 4 juin 2022 à 09:31:36 UTC+2, Daniel Nordlund <djnordlund using gmail.com> a écrit :
>>> On 5/28/2022 11:21 AM, varin sacha via R-help wrote:
>>> Dear R-experts,
>>> 
>>> While comparing groups, it is better to assess confidence intervals of those differences rather than comparing confidence intervals for each group.
>>> I am trying to calculate the CIs of the difference between the two Cramer's V and not the CI to the estimate of each group’s Cramer's V.
>>> 
>>> Here below my toy R example. There are error messages. Any help would be highly appreciated.
>>> 
>>> ##############################
>>> library(questionr)
>>> library(boot)
>>> 
>>> gender1<-c("M","F","F","F","M","M","F","F","F","M","M","F","M","M","F","M","M","F","M","F","F","F","M","M","M","F","F","M","M","M","F","M","F","F","F","M","M","F","M","F")
>>> color1<-c("blue","green","black","black","green","green","blue","blue","green","black","blue","green","blue","black","black","blue","green","blue","green","black","blue","blue","black","black","green","green","blue","green","black","green","blue","black","black","blue","green","green","green","blue","blue","black")
>>> 
>>> gender2<-c("F","F","F","M","M","F","M","M","M","F","F","M","F","M","F","F","M","M","M","F","M","M","M","F","F","F","M","M","M","F","M","M","M","F","F","F","M","F","F","F")
>>> color2<-c("green","blue","black","blue","blue","blue","green","blue","green","black","blue","black","blue","blue","black","blue","blue","green","blue","black","blue","blue","black","black","green","blue","black","green","blue","green","black","blue","black","blue","green","blue","green","green","blue","black")
>>> 
>>> f1=data.frame(gender1,color1)
>>> tab1<-table(gender1,color1)
>>> e1<-cramer.v(tab1)
>>> 
>>> f2=data.frame(gender2,color2)
>>> tab2<-table(gender2,color2)
>>> e2<-cramer.v(tab2)
>>> 
>>> f3<-data.frame(e1-e2)
>>> 
>>> cramerdiff=function(x,w){
>>> y<-tapply(x[w,1], x[w,2],cramer.v)
>>> y[1]-y[2]
>>> }
>>> 
>>> results<-boot(data=f3,statistic=cramerdiff,R=2000)
>>> results
>>> 
>>> boot.ci(results,type="all")
>>> ##############################
>>> 
>>>   
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> I don't know if someone responded offline, but if not, there are a
>> couple of problems with your code.   First, the f3 dataframe is not what
>> you think it is.  Second, your cramerdiff function isn't going to
>> produce the results that you want.
>> I would put your data into a single dataframe with a variable
>> designating which group data came from.  Then use that variable as the
>> strata variable in the boot function to resample within groups.  So
>> something like this:
>> f1 <- data.frame(gender=gender1,color=color1,group='grp1')
>> f2 <- data.frame(gender=gender2,color=color2,group='grp2')
>> f3 <- rbind(f1,f2)
>> cramerdiff <- function(x, ndx) {
>>    # calculate cramer.v for group 1 bootstrap sample
>>    g1 <-x[ndx,][x[,3]=='grp1',]
>>    cramer_g1 <- cramer.v(table(g1[,1:2]))
>>    # calculate cramer.v for group 2 bootstrap sample
>>    g2 <-x[ndx,][x[,3]=='grp2',]
>>    cramer_g2 <- cramer.v(table(g2[,1:2]))
>>    # calculate difference
>>    cramer_g1-cramer_g2
>>    }
>> # use strata parameter in function boot to resample within each group
>> results <- boot(data=f3,statistic=cramerdiff,
>> strata=as.factor(f3$group),R=2000)
>> results
>> boot.ci(results)
>> Hope this is helpful,
>> Dan



More information about the R-help mailing list