[R] merged files

David Winsemius dwinsemius at comcast.net
Thu Apr 29 18:05:13 CEST 2010


On Apr 29, 2010, at 10:21 AM, Alex Jameson wrote:

> Hi,
>
> i have two files (file1.txt and file2.txt) which i would like to  
> merge,
> based on certain criteria, i.e.
> it combines data based on matching geneID and exons.
> i have used the merge option,

Huh? What is the merge option? (There is a merge _function_.)

> but it

"It"?  Please provide the code you used. Have you yet read the Posting  
Guide as I urged you earlier?

> does not give me the desired outcome.
> merged.txt shows the result i would like.
>

Given that those two files have no GeneID and Exons in common (after I  
took you mangled HTML posting and fixed each one to create readable  
files) , I would expect that this call which would implement the merge  
you requested above would produce 0 rows:

merge(dtd, File2, by=c("GeneID", "Exons"))  # which would be an inner  
join

Many (most?) of the numbers in the third desired file that we are  
seeing in mangled form do not appear in either of those two input  
files, so you appear to be requesting that we hack into your system to  
get them. Now what was it that you really wanted? (And no more HTML  
postings ... and use the dput function. That would be an equivalent to  
the dump method in the Posting Guide which (again) I urge you to read.)

-- 
David
>
>
>
> *File1. txt*
> **
>     AffyProbe ProbeType Flag GeneSymbol GeneID Exons Chrom Strand  
> Affytart
> AffyEnd   1 1007_s_at:1105:483 0 0 DDR1 780 21 6 + 30975403 30975427 2
> 1007_s_at:1119:177 0 0 DDR1 780 21 6 + 30975549 30975573 3
> 1007_s_at:1136:469 0 0 DDR1 780 21 6 + 30975766 30975790 4 1007_s_at: 
> 192:205
> 0 0 DDR1 780 21 6 + 30975523 30975547 5 1007_s_at:474:1161 0 0 DDR1  
> 780 21 6
> + 30975745 30975769 6 1007_s_at:504:983 0 0 DDR1 780 21 6 + 30975575
> 30975599 7 1007_s_at:50:779 0 0 DDR1 780 21 6 + 30975758 30975782
>
> *File2.txt*
>
>    AgilentProbe ProbeType Flag GeneSymbol GeneID Exons Chrom Strand
> AgilentStart AgilentEnd   1 A_23_P100001 0 0 FAM174B 400451 5 15 -  
> 90961852
> 90961793 2 A_23_P100022 0 0 SV2B 9899 14 15 + 89639333 89639392 3
> A_23_P100056 0 0 RBPMS2 348093 8 15 - 62819428 62819369 4  
> A_23_P100074 0 0
> AVEN 57099 6 15 - 31946031 31945972 5 A_23_P100092 0 0 ZSCAN29  
> 146050 5 15 -
> 41440680 41440621 6 A_23_P100103 0 0 VPS39 23339 24 15 - 40240319  
> 40240260 7
> A_23_P100111 0 0 CHP 11261 7 15 + 39358845 39358904 8 A_23_P100127 0  
> 0 CASC5
> 57082 11 15 + 38704817 38704876 9 A_23_P100133 0 0 ATMIN 23300 4 16 +
> 79636596 79636655 10 A_23_P100141 0 0 UNKL 64718 12 16 - 1355346  
> 1355287
>
>
> *merged.txt (Should look like this)*
>
>       GeneSymbol GeneID Exons Chrome AffyMatrixProbeID AffyStart  
> AffyEnd
> AgilentProbeID AgilentStart AgilentEnd DDR1 780 21 6        
> A_24_P123601
> 30975848 30975907 RFC2 5982 10 7 1053_at:120:925,
> 1053_at:504:41,
> 1053_at:522:871,
> 1053_at:828:1025,
> 203696_s_at:291:651 73287845,
> 73287869,
> 73287863,
> 73287881,
> 73287850 73287821,
> 73287845,
> 73287839,
> 73287857,
> 73287826 A_23_P93823 73287861 73287802 RFC2 5982 11 7              
> HSPA6 3310
> 1 1       A_23_P114903 159762782 159762841 PAX8 7849 12 2        
> A_23_P210001
> 113691555 113691496 GUCA1A 2978 6 6             UBA7 7318 24 3
> 1294_at:1079:379,
> 1294_at:361:881,
> 203281_s_at:524:889,
> 203281_s_at:678:1017,
> 203281_s_at:68:1153 49818386,
> 49818398,
> 49818378,
> 49818434,
> 49818422 49818362,
> 49818374,
> 49818354,
> 49818420,
> 49818398
>
>
> sorry for the long tables,
>
> thanks
>
> Alex
>
> Student
> University of Colorado
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list