[R] crosstabulation and unlist function

rmailbox at justemail.net rmailbox at justemail.net
Mon Oct 12 22:23:17 CEST 2009


What you're really saying is that you don't care about the distinction between "aa", "bb" and "cc". In that case, a different arrangement of the data will be more useful:

library (reshape )
df.melt <- melt ( df, id.var = "dd")
with ( df.melt, table ( dd, value ) ) 

Eric


----- Original message -----
From: "eugen pircalabelu" <eugen_pircalabelu at yahoo.com>
To: "David Winsemius" <dwinsemius at comcast.net>
Cc: "R-help" <r-help at stat.math.ethz.ch>
Date: Mon, 12 Oct 2009 13:05:33 -0700 (PDT)
Subject: Re: [R] crosstabulation and unlist function

Hello,
First of all, thank you David for your reply, but sadly this is not what i wanted (i am sorry for not being more specific about my problem!)
   
 aa<-c(1:5)
 bb<-c(NA,2,NA,4,5)
 cc<-c(1,2,NA,4,NA)
 dd<-c("A","B","B","A","C")
 table(unlist(df[,1:3]))

> df
  aa bb cc dd
1  1 NA  1  A
2  2  2  2  B
3  3 NA NA  B
4  4  4  4  A
5  5  5 NA  C

I do not want to get this:
> tapply(apply(df[,1:3],1,sum, na.rm=TRUE), df$dd, sum)
A  B  C
14  6 10

but a crosstabulation between  table(unlist(df[,1:3])) and df$dd, which should look something like this:

    1   2   3   4  5
A  2   0   0   3  0
B  0   3   1   0  0
C  0   0   0   0  2

meaning that when dd is A 1 appears 2 times, 2 doesn't appear, 3 doesn't appear, 4 appears 3times, 5 doesn't appear; when dd is C only 5 appears 2 times (i am not really interested in the NA occurence).
Hopefully, this time my question was a lot more clear.
Thank you very much !

 

 




----- Original Message ----
From: David Winsemius <dwinsemius at comcast.net>
To: David Winsemius <dwinsemius at comcast.net>
Cc: eugen pircalabelu <eugen_pircalabelu at yahoo.com>; R-help <r-help at stat.math.ethz.ch>
Sent: Mon, October 12, 2009 9:36:39 PM
Subject: Re: [R] crosstabulation and unlist function


On Oct 12, 2009, at 3:25 PM, David Winsemius wrote:

> 
> On Oct 12, 2009, at 2:36 PM, eugen pircalabelu wrote:
> 
>> Hello R-users,
>> 
>> My toy example:
>> aa<-c(1:5)
>> bb<-c(NA,2,NA,4,5)
>> cc<-c(1,2,NA,4,NA)
>> dd<-c("A","B","B","A","C")
>> df<-data.frame(aa,bb,cc,dd=as.factor(dd))
>> table(unlist(df[,1:3]))
>> 
>> Can anyone point me to what function let's me do a crosstabulation between   table(unlist(df[,1:3])) and df$dd?
>> I want to find out when dd==A (or B, or C) how many times do the values 1, 2 ,3,..  appear in df[,1:3]?
>> Thank you very much!
> 
> One way would be to collect the row sums of those columns first, and then sum by index:
> 
> tapply(apply(df[,1:3],1,sum, na.rm=TRUE), df$dd, sum)
> A  B  C
> 14  9 10

This method is safer than working on table(unlist(df[, 1:3]) since it does not "break" when an entire row is empty.

> aa<-c(1,2,NA,4,5)
> bb<-c(NA,2,NA,4,5)
> cc<-c(1,2,NA,4,NA)
> dd<-c("A","B","B","A","C")
> df<-data.frame(aa,bb,cc,dd=as.factor(dd))
> table(unlist(df[,1:3]))

1 2 4 5
2 3 3 2     # missing row willno longer be aligned with "dd".
> tapply(table(unlist(df[,1:3])), df$dd, sum)
Error in tapply(table(unlist(df[, 1:3])), df$dd, sum) :
  arguments must have same length

> tapply(apply(df[,1:3],1,sum, na.rm=TRUE), df$dd, sum)
A  B  C
14  6 10


> 
> --
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list