[R] Advice needed on awkward tables

Jim Lemon jim at bitwrit.com.au
Tue May 11 12:39:16 CEST 2010

On 05/11/2010 02:05 PM, Greg Orm wrote:
> Dear r-help list members,
> I am quite new to R, and hope to seek advice from you about a problem I have
> been cracking my head over. Apologies if this seems like a simple problem.
> I have essentially two tables. The first (Table A) is a standard patient
> clinicopathological data table, where rows correspond to patient IDs and
> columns correspond to clinical features. Records in this table are stored as
> 1 or 0 (denoting presence). An example is provided below.
> The second (Table B) is a table that represents a 'key' to Table A. This
> Table B has a category field, as well as a feature field which links to the
> Table B. Unfortunately, this is a one-to-many relationship, and the numbers
> in the feature field represent the respective columns in Table A, delimited
> by semicolons. So in the example below, I need to collapse the data in Table
> B into a table with nrow equivalent to the number of categories and ncol =
> number of patients. The collapsing of each categoriy, will be based on a
> Boolean OR, or the equivalent ANY in R (so long as 1 of the features is
> true, the resulting outcome will be true)
> data.table.a<-
> matrix(data=round(runif(100)),nrow=10,ncol=10,dimnames=list(paste("Patient",1:10),paste("Feature",1:10)))
> data.table.b<- data.frame
> (ID=c(1,2,3,4,5,6,7),CATEGORY=c(1,2,3,3,4,5,5),FEATURE=c("9","3;5","7","4","6;10","1;2","8"))
> In the example tables above, we hope to collapse the features by category -
> so the final desired output will be a total of 10 patients as rows, and a
> total of 5 categories as columns. (after collapsing the features by a
> Boolean OR). (i.e. if any of the features in the category are present, it
> will be a TRUE).
> I apologize for the apparently awkward table, but this is what I had to
> start with. I tried expanding data.table.b$FEATURE using strsplit, which
> resulted in a list, and then I got stuck there for a long time.

Hi Greg,
Messy, but I think it works.

feature2category<-function(dta,dtb) {
  for(patrow in 1:dim(dta)[1]) {
   for(catrow in 1:dim(dtb)[1]) {


More information about the R-help mailing list