[R] quick matching question

Marc Schwartz marc_schwartz at me.com
Fri Oct 28 16:59:50 CEST 2011


On Oct 28, 2011, at 9:49 AM, Ben Ganzfried wrote:

> Hey,
> 
> I'm trying to match patient identifiers from two separate input files, and
> then add information from one of the input files to the corresponding output
> file.  I'd greatly appreciate any help!
> 
> More specifically,
> Input_File_1 has a column header "bcr_patient_barcode"
> Input_File_2 has a column header "Barcode" and a column header "Batch"
> 
> I want my script to match the appropriate patient identifiers since
> "bcr_patient_barcode" and "Barcode" are not in the same order.  Then I want
> to add the information from "Batch" to the corresponding patient.
> 
> My (incorrect) code is below:
> 
> #batch
> tmp <- Input_File_2$Barcode
> tmp1 <- Input_File_1$bcr_patient_barcode
> 
> for i in tmp
> for item in tmp1
> if (tmp == tmp1) {
>  curated$batch <- Input_File_2$Batch
> }
> 
> Thanks!


See ?merge and then use something like:

  newDF <- merge(Input_File_2, Input_File_1, by.x = "Barcode", by.y = "bcr_patient_barcode")

Also, pay attention to the 'all', 'all.x' and 'all.y' arguments, which control whether or not only matching records are retained or non-matching records are retained from one or both datasets. merge() performs an "SQL-like" join operation.

HTH,

Marc Schwartz



More information about the R-help mailing list