[R] Visualising multiple response contingency tables
ilai
keren at math.montana.edu
Wed Mar 14 06:03:36 CET 2012
Not sure I understand your question (or if there is one) and I am not
familiar with vcd::mosaic. But if you are asking is there a simpler
way ? than yes:
1. work with ?array and ?aperm
2. create the array directly in R from the original data - not excel
3. ?mosaicplot (no package required - it's in grid)
Here is what I mean based on your f.tbl:
>> f.tbl = structure(c(10, 15, 25, 45, 30, 50), .Dim = 2:3, .Dimnames = structure(list(Sex = c("F", "M"), Responses = c("A", "B", "total subjects")), .Names = c("Sex", "Responses")), class = "table")
# Calculate the No-A No-B columns:
(ff.tbl <- rbind(f.tbl[,1:2],f.tbl[,3]-f.tbl[,1:2]))
# rearrange to a CxRxB (in this case 2x2x2) array:
dim(ff.tbl) <- c(2,2,2)
# give some names
dimnames(ff.tbl) <- list(Sex=c('F','M'),c('yes','no'),Response=c('A','B'))
ff.tbl
# plot
mosaicplot(ff.tbl)
# or plot
mosaicplot(aperm(ff.tbl,3:1))
# or test
apply(ff.tbl, 3 , chisq.test) # and sum the result
Hope this helps get you started
> f.tbl Responses
> Sex A B total subjects
> F 10 25 30
> M 15 45 50
>
>
> The answer I have is to adjust my data and then use the mosaic() function
> in package:vcd; however, I'm not sure that's the best way forward and I
> don't have a very efficient way of getting there. I will present my
> solution so you guys can take a look.
>
> The fundamental problem is that because of the multiple response data, you
> can't simply apply a normal Chi-square test to the contingency table.
> There's a raft of approaches, but I've decided to use a simple technique
> introduced by (A. Agresti, I. Liu, Modeling a categorical variable allowing
> arbitrarily many category choices, Biometrics 55 (1999) 936-43.) and
> refined by Thomas and Decady and Bilder and Loughin. In summary, the test
> statistic (a modified Chi square statistic) is calculated by summing up the
> individual chi-square statistics for each of the c marginal r в 2 tables
> relating the single response variable to the multiple response variable
> with df = c(r - 1)). Note, that instead of using the row totals (total
> number of responses) the test statistic is calculated with the total number
> of subjects per row.
>
> (phew, I hope that made sense :) ) Unfortunately, my google-research has
> not revealed an easy way to transform my one data table into c x r x 2
> tables for analysis. So I end up having to create the two different tables
> myself, shown below (note that the Not-A/B columns are calculated as the
> difference between the main data column (A/B) and the total number of
> subjects listed above.
>
>> g.mtrx=matrix(c(10,15,20,35),nrow=2)> g.tbl=as.table(g.mtrx)> dimnames(g.tbl)=list(Sex=c("F","M"),Responses=c("A","Not-A"))> g.tbl Responses
> Sex A Not-A
> F 10 20
> M 15 35
>
>> h.tbl=as.table(h.mtrx)> h.mtrx=matrix(c(25,45,5,5),nrow=2)> h.tbl=as.table(h.mtrx)> dimnames(h.tbl)=list(Sex=c("F","M"),Responses=c("B","Not-B"))> h.tbl Responses
> Sex B Not-B
> F 25 5
> M 45 5
>
>
> If I then preform the normal Chi-square test on each of the two tables
> (chisq.test()) and then sum up the results, I get the answer I want.
> Clearly this is cumbersome, which is why I do it in Excel at the moment (I
> know shame on me). However, I really want to take advantage of the mosaic
> function in vcd. So what I have to do at the moment is create the tables
> above and use abind() (package:abind) to bring my two matrices together to
> form a multidimensional matrix. Example:
>
>> gh.abind = abind(g.mtrx,h.mtrx,along=3)> dimnames(gh.abind)=list(Sex=c("F","M"),Responses=c("Yes","No"),Factors=c("A","B"))> gh.abind, , Factors = A
>
> Responses
> Sex Yes No
> F 10 20
> M 15 35
>
> , , Factors = B
>
> Responses
> Sex Yes No
> F 25 5
> M 45 5
>
> Now I can use the simple mosaic function to plot the combined matrix
>
>> mosaic(gh.abind)
>
> So that's it. I don't use any pearson-r shading in mosaic since I
> don't think it would be appropriate to try and model my weird multiple
> response tables (at the moment), but what I will do is look at the
> odds-ratio table and then manually colour the mosaic cells with high
> odds-ratios (greater than 2).
>
> I am literally having to type all this by hand into R, and as you can
> imagine, it gets cumbersome with large multi column tables (which I
> have). Does any body have any thoughts on my approach of using mosaic
> for this sort of data? And if so, any insight on how I can be a bit
> slicker with my R code?
>
> All help is appreciated and I hope that this question wasn't too long
> to read through.Not sure I uderstand your question (or if there is one) and I am not familiar with vcd::mosaic. But if you are asking is there a simpler way ? than yes:
1. work with ?array and ?aperm not tables
2. create the array directly in R from the original data
3. ?mosaicplot (no package required - it's in grid)
Here is what I mean based on your f.tbl:
>> f.tbl = structure(c(10, 15, 25, 45, 30, 50), .Dim = 2:3, .Dimnames = structure(list(+ Sex = c("F", "M"), Responses = c("A", "B", "total subjects"+ )), .Names = c("Sex", "Responses")), class = "table")
# Calculate the No-A No-B columns:
(ff.tbl <- rbind(f.tbl[,1:2],f.tbl[,3]-f.tbl[,1:2]))
# rearrange to a CxRxB (in this case 2x2x2) array:
dim(ff.tbl) <- c(2,2,2)
# give some names
dimnames(ff.tbl) <- list(Sex=c('F','M'),c('yes','no'),Response=c('A','B'))
# plot
mosaicplot(ff.tbl)
# or plot
mosaicplot(aperm(ff.tbl,3:1))
# Now you could apply your test or whatever to each 2x2 Response with
> f.tbl Responses
> Sex A B total subjects
> F 10 25 30
> M 15 45 50
>
>
> The answer I have is to adjust my data and then use the mosaic() function
> in package:vcd; however, I'm not sure that's the best way forward and I
> don't have a very efficient way of getting there. I will present my
> solution so you guys can take a look.
>
> The fundamental problem is that because of the multiple response data, you
> can't simply apply a normal Chi-square test to the contingency table.
> There's a raft of approaches, but I've decided to use a simple technique
> introduced by (A. Agresti, I. Liu, Modeling a categorical variable allowing
> arbitrarily many category choices, Biometrics 55 (1999) 936-43.) and
> refined by Thomas and Decady and Bilder and Loughin. In summary, the test
> statistic (a modified Chi square statistic) is calculated by summing up the
> individual chi-square statistics for each of the c marginal r в 2 tables
> relating the single response variable to the multiple response variable
> with df = c(r - 1)). Note, that instead of using the row totals (total
> number of responses) the test statistic is calculated with the total number
> of subjects per row.
>
> (phew, I hope that made sense :) ) Unfortunately, my google-research has
> not revealed an easy way to transform my one data table into c x r x 2
> tables for analysis. So I end up having to create the two different tables
> myself, shown below (note that the Not-A/B columns are calculated as the
> difference between the main data column (A/B) and the total number of
> subjects listed above.
>
>> g.mtrx=matrix(c(10,15,20,35),nrow=2)> g.tbl=as.table(g.mtrx)> dimnames(g.tbl)=list(Sex=c("F","M"),Responses=c("A","Not-A"))> g.tbl Responses
> Sex A Not-A
> F 10 20
> M 15 35
>
>> h.tbl=as.table(h.mtrx)> h.mtrx=matrix(c(25,45,5,5),nrow=2)> h.tbl=as.table(h.mtrx)> dimnames(h.tbl)=list(Sex=c("F","M"),Responses=c("B","Not-B"))> h.tbl Responses
> Sex B Not-B
> F 25 5
> M 45 5
>
>
> If I then preform the normal Chi-square test on each of the two tables
> (chisq.test()) and then sum up the results, I get the answer I want.
> Clearly this is cumbersome, which is why I do it in Excel at the moment (I
> know shame on me). However, I really want to take advantage of the mosaic
> function in vcd. So what I have to do at the moment is create the tables
> above and use abind() (package:abind) to bring my two matrices together to
> form a multidimensional matrix. Example:
>
>> gh.abind = abind(g.mtrx,h.mtrx,along=3)> dimnames(gh.abind)=list(Sex=c("F","M"),Responses=c("Yes","No"),Factors=c("A","B"))> gh.abind, , Factors = A
>
> Responses
> Sex Yes No
> F 10 20
> M 15 35
>
> , , Factors = B
>
> Responses
> Sex Yes No
> F 25 5
> M 45 5
>
> Now I can use the simple mosaic function to plot the combined matrix
>
>> mosaic(gh.abind)
>
> So that's it. I don't use any pearson-r shading in mosaic since I
> don't think it would be appropriate to try and model my weird multiple
> response tables (at the moment), but what I will do is look at the
> odds-ratio table and then manually colour the mosaic cells with high
> odds-ratios (greater than 2).
>
> I am literally having to type all this by hand into R, and as you can
> imagine, it gets cumbersome with large multi column tables (which I
> have). Does any body have any thoughts on my approach of using mosaic
> for this sort of data? And if so, any insight on how I can be a bit
> slicker with my R code?
>
> All help is appreciated and I hope that this question wasn't too long
> to read through.
>
> All the best,
> Marcos
>
>
>
>
> --
> PhD Engineering Candidate
> University of Cambridge
> Department of Engineering
> Centre for Sustainable Development
> mp542 at cam.ac.uk <mp542 at cam.ac.uk>
>
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> All the best,
> Marcos
>
>
>
>
> --
> PhD Engineering Candidate
> University of Cambridge
> Department of Engineering
> Centre for Sustainable Development
> mp542 at cam.ac.uk <mp542 at cam.ac.uk>
>
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list