[R] counts and percentage of multiple categorical columns in R

arun smartpink111 at yahoo.com
Thu Jan 16 17:36:56 CET 2014


Also,
You can do the same with the previous solution:
result1 <- result[,-6]

vec1 <- unique(unlist(dat1))
result2 <- as.data.frame(t(sapply(dat1,function(x) {counts<- table(factor(x,levels=vec1)); percentage<-sprintf("%.1f",(counts/sum(counts))*100); c(paste0(counts,paste0("(",percentage,")")), sum(!is.na(x)))})),stringsAsFactors=FALSE)
 result2[,6] <- as.numeric(result2[,6])
 colnames(result2) <- colnames(result1)
 identical(result1,result2)
#[1] TRUE

A.K.

On Thursday, January 16, 2014 11:24 AM, arun <smartpink111 at yahoo.com> wrote:
Hi Jingxia,
May be this helps:

dat1 <- read.table(text="fatfreemilk fatmilk halfmilk 2fatmilk
A A A A
A B B A
B A A A
C C C C
D . A A
A E A E
C A B A
A . A A
A B . A
A A B E",sep="",header=TRUE,stringsAsFactors=FALSE,check.names=FALSE,na.strings=".")
 dat2 <- dat1
 dat2$id <- 1:nrow(dat2)

library(reshape2)
res <- acast(melt(dat2,id.var="id")[,-1],variable~value,length)
res[,-6] <- paste0(res[,-6],paste0("(",sprintf("%.1f",(res[,-6]/rowSums(res[,-6]))*100)),")")
 result <- as.data.frame(res,stringsAsFactors=FALSE)
#Either 
result$nonNAcount <- dim(dat1)[1]-as.numeric(result$`NA`)
#or
result$nonNAcount <- sapply(dat1,function(x) sum(!is.na(x)))
result[,-6]
#                  A       B       C       D       E nonNAcount
#fatfreemilk 6(60.0) 1(10.0) 2(20.0) 1(10.0)  0(0.0)         10
#fatmilk     4(50.0) 2(25.0) 1(12.5)  0(0.0) 1(12.5)          8
#halfmilk    5(55.6) 3(33.3) 1(11.1)  0(0.0)  0(0.0)          9
#2fatmilk    7(70.0)  0(0.0) 1(10.0)  0(0.0) 2(20.0)         10

A.K.





On Thursday, January 16, 2014 9:49 AM, Jingxia Lin <jingxia08 at gmail.com> wrote:

Dear Arun,

Sorry to bother you again.. But may I ask you for one more question regarding the data set? 
I am using the following method you offered for the data set. In our original data, there are some blank cells (i.e. data missing) in some columns. So in the output data frame, can we add an additional column to show the number of response (i.e. the number of non-blank cells)? I tried a couple of ways but failed (sorry I'm really not good at R...) I would be very grateful if you can help us with this problem at your convenience. Thank you! 

Best,
Jingxia

 dat2 <- dat1
 dat2$id <- 1:nrow(dat2)
library(reshape2)
 res <- dcast(melt(dat2,id.var="id")[,-1],variable~value,length)
row.names(res) <- res[,1]
res1 <- res[,-1]
res2 <- as.matrix(res1)
 res2[]<- paste0(res2,paste0("(",(res2/rowSums(res2))*100),")")
 as.data.frame(res2)


results 
#                A     B     C     D     E
#fatfreemilk 6(60) 1(10) 2(20) 1(10)  0(0)
#fatmilk     6(60) 2(20) 1(10)  0(0) 1(10)
#halfmilk    5(50) 4(40) 1(10)  0(0)  0(0)
#2fatmilk    7(70)  0(0) 1(10)  0(0) 2(20)




On Mon, Dec 30, 2013 at 3:50 PM, arun <smartpink111 at yahoo.com> wrote:

Dear Jingxia,
>No problem.  Happy New Year to you too!
>Arun
>
>
>
>
>
>
>
>
>On Monday, December 30, 2013 2:49 AM, Jingxia Lin <jingxia08 at gmail.com> wrote:
>
>Dear Arun,
>
>Thank YOU for your kind help :)  Happy new year!
>
>Best,
>Jingxia
>
>
>
>On Mon, Dec 30, 2013 at 3:43 PM, arun <smartpink111 at yahoo.com> wrote:
>
>Dear Jingxia,
>>
>>Glad that you were able to figure it out.  I was away from my computer.  My name is 'Arun Kirshna Sasikala-Appukuttan'.  I am a postdoctoral research fellow at Wayne State University, Detroit, MI, USA.  Thank you for the kind acknowledgment.
>>Regards,
>>Arun
>>
>>
>>
>>
>>
>>
>>
>>On Sunday, December 29, 2013 9:25 PM, Jingxia Lin <jingxia08 at gmail.com> wrote:
>>
>>Dear A.K.
>>
>>I also solved the character problem by using library(xlsx). So everything is fine now. Thank you again!
>>
>>Best,
>>Jingxia
>>
>>
>>
>>On Mon, Dec 30, 2013 at 10:17 AM, Jingxia Lin <jingxia08 at gmail.com> wrote:
>>
>>Dear A.K.,
>>>
>>>
>>>Thank you a lot! I tried your way and it works perfect. The only thing I haven't figured out is that while I exported the final data frame into an excel file, all Chinese characters were not shown correctly (my original data has Chinese in row/column names). Other than that, everything is great! 
>>>Would you mind letting me know your name so that we can acknowledge your help in our paper? Thank you again! 
>>>
>>>
>>>Best
>>>Jingxia
>>>
>>>
>>>
>>>
>>>
>>>On Mon, Dec 30, 2013 at 3:48 AM, arun <smartpink111 at yahoo.com> wrote:
>>>
>>>Hi,
>>>>Try:
>>>>dat1 <- read.table(text="fatfreemilk fatmilk halfmilk 2fatmilk
>>>>
>>>>A A A A
>>>>A B B A
>>>>B A A A
>>>>C C C C
>>>>D A A A
>>>>A E A E
>>>>C A B A
>>>>A A A A
>>>>A B B A
>>>>A A B E",sep="",header=TRUE,stringsAsFactors=FALSE,check.names=FALSE)
>>>> dat2 <- dat1
>>>> dat2$id <- 1:nrow(dat2)
>>>>library(reshape2)
>>>> res <- dcast(melt(dat2,id.var="id")[,-1],variable~value,length)
>>>>row.names(res) <- res[,1]
>>>>res1 <- res[,-1]
>>>>res2 <- as.matrix(res1)
>>>> res2[]<- paste0(res2,paste0("(",(res2/rowSums(res2))*100),")")
>>>> as.data.frame(res2)
>>>>#                A     B     C     D     E
>>>>#fatfreemilk 6(60) 1(10) 2(20) 1(10)  0(0)
>>>>#fatmilk     6(60) 2(20) 1(10)  0(0) 1(10)
>>>>#halfmilk    5(50) 4(40) 1(10)  0(0)  0(0)
>>>>#2fatmilk    7(70)  0(0) 1(10)  0(0) 2(20)
>>>>A.K.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>On Sunday, December 29, 2013 1:07 PM, Jingxia Lin <jingxia08 at gmail.com> wrote:
>>>>Dear R helpers,
>>>>
>>>>I have a data sheet (“milk”) with four types of milk from five brands (A,
>>>>B, C, D, E), the column shows the brands that each customer chose for each
>>>>type of the milk they bought. The data sheet goes like below. You can see
>>>>for some type of milk, no brand is chosen.
>>>>
>>>>fatfreemilk fatmilk halfmilk 2fatmilk
>>>>A A A A
>>>>A B B A
>>>>B A A A
>>>>C C C C
>>>>D A A A
>>>>A E A E
>>>>C A B A
>>>>A A A A
>>>>A B B A
>>>>A A B E
>>>>
>>>>I want to summarize each column so that for each type of milk, i know the
>>>>counts and percentages of the brands chosen for each milk type. I tried
>>>>"summary" in R, but the result is not shown nicely. How I can display the
>>>>result in a way like below:
>>>>A B C D E
>>>>fatfreemilk 6(60) 1(10) 2(20) 1(10) 0(0)
>>>>fatmilk 6(60) 2(20) 1(10) 0(10) 1(10)
>>>>halfmilk 5(50) 4(40) 1(10) 0(0) 0(0)
>>>>2fatmilk 7(70) 0(0) 1(10) 0(0) 2(20)
>>>>
>>>>Thank you!
>>>>
>>>>
>>>>    [[alternative HTML version deleted]]
>>>>
>>>>______________________________________________
>>>>R-help at r-project.org mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>
>




More information about the R-help mailing list