[R] Advice needed on awkward tables

Gabor Grothendieck ggrothendieck at gmail.com
Tue May 11 13:45:48 CEST 2010


Reduce the non-unique case to the unique case in the first line and
then form the ids ensuring that ids has names.  Finally sapply over
the ids summing across rows using drop = FALSE so that the two cases
in your code are both handled at once.  Adding 0 converts to numeric.

ag <- aggregate(data.table.b[3], data.table.b[2], paste, collapse = ";")

ids <- strsplit(as.character(ag$FEATURE), ";")
names(ids) <- ag$CATEGORY

f <- function(fs) 0+(rowSums(data.table.a[, as.numeric(fs), drop = FALSE]) > 0)
sapply(ids, f)

Here is the output:

           Cardiac CNS Gastro Respiratory
Patient 1        1   1      0           0
Patient 2        1   0      1           0
Patient 3        1   0      1           0
Patient 4        0   0      0           1
Patient 5        0   0      1           0
Patient 6        1   1      1           1
Patient 7        0   0      1           0
Patient 8        1   0      1           0
Patient 9        1   0      0           0
Patient 10       1   0      1           1
Patient 11       1   0      1           1
Patient 12       1   1      1           1
Patient 13       1   0      1           0
Patient 14       1   1      1           0
Patient 15       1   0      1           1
Patient 16       1   0      1           1
Patient 17       1   1      1           0
Patient 18       0   0      1           0
Patient 19       1   0      1           1
Patient 20       1   0      1           0



On Tue, May 11, 2010 at 2:05 AM, Greg Orm <splicemix at gmail.com> wrote:
> Apologies.
>
> Let me clarify. I have included my code below :
>
> data.table.b represents the medical nomenclature, whereas data.table.a is a
> patient derived database.
> data.table.b$CATEGORY categorizes features (e.g. 'cardiac', 'respiratory'),
> whereas data.table.b$FEATURE is a corresponding disease (e.g. under CATEGORY
> 'cardiac', there could be heart attack, heart failure as FEATURES)
>
> When data.table.b$CATEGORY is unique, for example below, where there are 20
> patients in data.table.a, and data.table.b contains 6 categories (Cardiac …
> Endocrine), with a total of 8 features (1:8), it is not a problem for me to
> extract the data (e.g. for features 3 and 5, so long as one is positive, the
> final category under Cardiac is positive)
>
> data.table.a <-
> matrix(data=round(runif(160)),nrow=20,ncol=8,dimnames=list(paste("Patient",1:20),paste("Feature",1:8)))
> data.table.b <- data.frame
> (ID=c(1,2,3,4,5,6),CATEGORY=c("Cardiac","Respiratory","Gastro","Renal","CNS","Endocrine"),FEATURE=c("3;5","7","4","6","1;2","8"))
> ids <- strsplit(as.character(data.table.b$FEATURE),";")
>
> i=vector()
> outcome=matrix(data=NA,nrow=20,ncol=6)
>
> for (i in 1: 6){
> if (is.vector ( data.table.a[,as.integer(ids [[i]]) ])) {
> outcome [,i] <- data.table.a[,as.integer(ids [[i]]) ]
>    }
>    else {
> outcome [,i] <- rowSums(data.table.a[,as.integer(ids [[i]])])>0
> }
> }
> #the if else is needed because I can't figure out what command can work both
> on a vector(single feature) or an array(multiple features in the same cell,
> such as 3;5)
> #RowSums is used here kind of like a Boolean OR for the categories
>
> colnames(outcome) <-  data.table.b$CATEGORY
> rownames(outcome) <- rownames(data.table.a)
>
> #outcome is what I need.
>
>
> The problem I am having, and because I am not very good at manipulating
> tables, is just how to manage a situation where CATEGORY is non-unique, such
> as in the example below :
>
>
> data.table.a <-
> matrix(data=round(runif(160)),nrow=20,ncol=8,dimnames=list(paste("Patient",1:20),paste("Feature",1:8)))
> data.table.b <- data.frame
> (ID=c(1,2,3,4,5,6),CATEGORY=c("Cardiac","Cardiac","Respiratory","Gastro","Gastro","CNS"),FEATURE=c("3;5","7","4","6","1;2","8"))
>
> Thanks.
>
> I hope this is a bit clearer.
>
> Regards,
> Greg
>
> On Tue, May 11, 2010 at 1:34 AM, Daniel Malter <daniel at umd.edu> wrote:
>
>>
>> Hi, even after rereading, I have little of a clue what it is exactly that
>> you
>> are trying to do. It'd help if you provided a more concise, step-by-step
>> description and/or the smallest unambiguous example of the two tables AND
>> of
>> what should come out at the end. Also, unless for relatively trivial
>> problems, the list typically likes to see some own effort and where you are
>> stuck, rather than to solve the whole problem.
>>
>> Best,
>> Daniel
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/Advice-needed-on-awkward-tables-tp2173289p2173341.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list