[R] pairing columns based on a value

Fri Dec 19 07:34:27 CET 2014

I do not think that you need regular expressions for your problem. 
Please see the below:

 > d0 <- dat_unmatched
 > tmp <- apply(d0, 1, function(x){
+ first <- substr(x,1,1)
+ idx <- which(c("T", "Y") == first)
+ comb <- paste(x[idx[1]-1], x[idx], collapse=" ")
+ unlist(strsplit(comb, " "))
+ })
 > names(tmp) <- d0$ID
 > tmp
$MCZ4325
[1] "C23.2" "T43.2"

$GDR2343
[1] "M20.64" "Y32.1"  "M20.64" "T44.2"

$BZD2643
[1] "B83.2" "T43.2" "B83.2" "Y32.1" "B83.2" "T44.2"

$BCM3455
[1] "B83.2" "T43.2"

Is this what you are looking for?  I hope this helps.

Chel Hee Lee

On 12/18/2014 07:41 AM, Michael Dewey wrote:
> Not sure how much help it will be but there is a package on CRAN called
> icd9. Although clearly the codes are different in ICD 10 it may give you
> some hints. I suppose you could even email the maintainer to see whether
> there is an icd10 in the pipeline.
>
> On 17/12/2014 20:14, Robert Strother wrote:
>> I have a large dataset (~50,000 rows, 96 columns), of hospital
>> administrative data.
>> many of the columns are clinical coding of inpatient event (using
>> ICD-10).
>> A simplified example of the data is below
>>
>>> dput(dat_unmatched)
>> structure(list(ID = structure(c(4L, 3L, 2L, 1L), .Label = c("BCM3455",
>> "BZD2643", "GDR2343", "MCZ4325"), class = "factor"), X.1 =
>> structure(c(2L,
>> 3L, 1L, 1L), .Label = c("B83.2", "C23.2", "F56.23"), class = "factor"),
>>      X.2 = structure(c(2L, 1L, 2L, 2L), .Label = c("M20.64", "T43.2"
>>      ), class = "factor"), X.3 = structure(c(2L, 3L, 3L, 1L), .Label =
>> c("F56.23",
>>      "R23.1", "Y32.1"), class = "factor"), X.4 = structure(c(1L,
>>      2L, 2L, 3L), .Label = c("M23.5", "T44.2", "Y32.1"), class =
>> "factor"),
>>      X.5 = structure(c(1L, 2L, 1L, 2L), .Label = c("", "Q23.6"
>>      ), class = "factor")), .Names = c("ID", "X.1", "X.2", "X.3",
>> "X.4", "X.5"), class = "data.frame", row.names = c(NA, -4L))
>>
>> I am interested in a set of codes that start with a "T" or a "Y", and
>> link
>> them to the preceding column that does not begin with a "T" or "Y".   I
>> suspect I will need to use regular expressions, and likely a loop, but
>> I am
>> really out of my depth at this point.
>>
>> I would like the final dataset to look like:
>>
>>> dput(dat_matched)
>> structure(list(ID = structure(c(4L, 3L, 2L, 1L), .Label = c("BCM3455",
>> "BZD2643", "GDR2343", "MCZ4325"), class = "factor"), X.1 =
>> structure(c(2L,
>> 3L, 1L, 1L), .Label = c("B83.2", "C23.2", "M20.64"), class = "factor"),
>>      X.2 = structure(c(1L, 2L, 1L, 1L), .Label = c("T43.2", "Y32.1"
>>      ), class = "factor"), X.3 = structure(c(1L, 4L, 2L, 3L), .Label =
>> c("",
>>      "B83.2", "F56.23", "M20.64"), class = "factor"), X.4 =
>> structure(c(1L,
>>      2L, 3L, 3L), .Label = c("", "T44.2", "Y32.1"), class = "factor"),
>>      X.5 = structure(c(1L, 1L, 2L, 1L), .Label = c("", "B83.2"
>>      ), class = "factor"), X = structure(c(1L, 1L, 2L, 1L), .Label =
>> c("",
>>      "T44.2"), class = "factor")), .Names = c("ID", "X.1", "X.2",
>> "X.3", "X.4", "X.5", "X"), class = "data.frame", row.names = c(NA,
>> -4L))
>>
>> Any help appreciated.
>>
>> Matthew
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 2015.0.5577 / Virus Database: 4253/8759 - Release Date: 12/18/14
>>
>>
>