[R] extract data from a data frame field

jim holtman jholtman at gmail.com
Tue Jun 7 04:58:51 CEST 2011


Here is a start; you can change the column names:

> x
   chr  start    end              peak_loc cluster_TC strand peak_TC
1 chr1 564620 564649 chr1:564644..564645,+         94      +      10
2 chr1 565369 565404 chr1:565371..565372,+        217      +       8
3 chr1 565463 565541 chr1:565480..565481,+       1214      +      15
4 chr1 565653 565697 chr1:565662..565663,+       1031      +      28
5 chr1 565861 565922 chr1:565883..565884,+        316      +      12
6 chr1 566537 566573 chr1:566564..566565,+        119      +      11
> y <- sub("^.*:([[:digit:]]+)..([[:digit:]]+).*", "\\1 \\2", x$peak_loc)
> y
[1] "564644 564645" "565371 565372" "565480 565481" "565662 565663"
"565883 565884" "566564 566565"
> y <- strsplit(y, ' ')
> y
[[1]]
[1] "564644" "564645"

[[2]]
[1] "565371" "565372"

[[3]]
[1] "565480" "565481"

[[4]]
[1] "565662" "565663"

[[5]]
[1] "565883" "565884"

[[6]]
[1] "566564" "566565"

> x.new <- cbind(x, do.call(rbind, y))
> x.new
   chr  start    end              peak_loc cluster_TC strand peak_TC
   1      2
1 chr1 564620 564649 chr1:564644..564645,+         94      +      10
564644 564645
2 chr1 565369 565404 chr1:565371..565372,+        217      +       8
565371 565372
3 chr1 565463 565541 chr1:565480..565481,+       1214      +      15
565480 565481
4 chr1 565653 565697 chr1:565662..565663,+       1031      +      28
565662 565663
5 chr1 565861 565922 chr1:565883..565884,+        316      +      12
565883 565884
6 chr1 566537 566573 chr1:566564..566565,+        119      +      11
566564 566565


On Mon, Jun 6, 2011 at 8:22 PM, ads pit <deconstructed.morning at gmail.com> wrote:
> Hi all,
> I am given the a data frame in which one of the columns has more information
> together- see column 4, peak_loc:
>  chr  start    end              peak_loc cluster_TC strand peak_TC
> 1 chr1 564620 564649 chr1:564644..564645,+         94      +      10
> 2 chr1 565369 565404 chr1:565371..565372,+        217      +       8
> 3 chr1 565463 565541 chr1:565480..565481,+       1214      +      15
> 4 chr1 565653 565697 chr1:565662..565663,+       1031      +      28
> 5 chr1 565861 565922 chr1:565883..565884,+        316      +      12
> 6 chr1 566537 566573 chr1:566564..566565,+        119      +      11
>
>
>  I am trying to find out if there's a way to extract the coordinates given
> in the 4th column and replace this column with two others that would have
> the start coord and the end coord. so instead of chr1:564644..564645,+
> I would obtain;
> start_peak  end_peak
> 564644       564645
>
> Best,
> nanami
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?



More information about the R-help mailing list