[R] Help with isolating and comparing data from two files.

ajn21 ajn21 at case.edu
Mon May 23 06:00:10 CEST 2011


Hello,

I was hoping that someone would be able to help me or at least point me in
the right direction regarding a problem I am having. I am a new R user, and
I've been trying to read tutorials but they haven't been much help to me so
far.

The problem is relatively simple as I've already created working solutions
in Java and Perl, but I need a solution in R as well. 

I have two text files, say pos.txt and reg.txt. In pos.txt, the data is
listed for example:

c22 1445  - CG 1 4
c22 1542 + CG 2 3
c22 1678 + CG 13 15
...

etc. for thousands of lines. The most important column is column 2, which
lists "position" (e.g. 1445, 1542, 1678). In reg.txt, data is listed as:

c22 1440 1500 cpg: 44 56 ......
c22 1520 1700 cpg: 56 87 ......
c22 1800 1900 cpg: 58 90 ......
...

where the values in column 2 is the "start" position and values in column 3
are the "end" position. There are 10 columns total but I just listed the
first few. Also, the text files are different lengths.


Essentially, my problem is trying to take the position listed in column 2 of
pos.txt and try to find the region (based on start and end positions) listed
in reg.txt. Then I need to print:

c22 "start" "end" "position" + 1 5 

where the last 3 columns are from pos.txt as well (i.e. all of the lines
don't end in  + 1 5, but rather the values for the columns in pos.txt).
Also, the position needs to be within the start and end position.

So far I've been able to use read.table to create a data frame for each text
file, and I've also named each column (e.g. reg.data$end) and I can output
each column individually. However, the problem I keep facing is how to
compare the numbers for "position" in pos.txt to the numbers for "start" and
"end" in reg.txt. I tried to use: 

if ((pos >= start) | (pos <= end))..

but an error comes up that says the files aren't the same length.

In Java and Perl I used nested loops to cycle through each element in one
file, and compare it to every element in the other file, and then printed to
a new text file. As such, I was trying to learn a bit more about arrays in
R, but if you know of a better way in R to do this then please let me know.

Any help is greatly appreciated.

Thank you,
AJ

--
View this message in context: http://r.789695.n4.nabble.com/Help-with-isolating-and-comparing-data-from-two-files-tp3543170p3543170.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list