[R] Table Intersection

Martin Morgan mtmorgan at fhcrc.org
Wed Jan 18 19:16:51 CET 2012


On 01/18/2012 07:25 AM, rantree wrote:
> I've got two tables....
>
> first one(table1):
>
> ID     chrom        start         end
>
> Ex1       2                152          180
> Ex2       10              2000          2220
> Ex3       15              3000           4000
>
> second one ( table2):
>
> chrom      location        name
> 2                     160              Alv
> 2                    190               GNN
> 2                    100               ARg
> 10                  210               GGG
> 15                 3200             ADSA
>
>   What I have to do is to put name column in table1  when  the location of
> the name  is between the start and end ....and chrom must be the same....it
> will be this the result:
>
> ID     chrom        start         end               name
> Ex1       2                152          180               Alv
> Ex2       10              2000          2220         GGG
> Ex3       15              3000           4000         ADSA
>
>
> How can i do this ????

Install the Bioconductor package GenomicRanges

   source("http://bioconductor.org/biocLite.R")
   biocLite("GenomicRanges")

then

library(GenomicRanges)
t1 <- GRanges(c("2", "10", "15"),
               IRanges(c(152, 2000, 3000),
                       c(180, 2220, 4000)),
               Id=c("Ex1", "Ex2", "Ex3"))
t2 <- GRanges(c("2", "2", "2", "10", "15"),
               IRanges(c(160, 190, 100, 2010, 3200),
                       width=1),
               Name=c("Alv", "GNN", "ARg", "GGG", "ADSA"))
idx <- match(t1, t2)
values(t1)$Name <- values(t2)$Name[idx]

leading to

 > t1
GRanges with 3 ranges and 2 elementMetadata values:
       seqnames       ranges strand |          Id        Name
          <Rle>    <IRanges>  <Rle> | <character> <character>
   [1]        2 [ 152,  180]      * |         Ex1         Alv
   [2]       10 [2000, 2220]      * |         Ex2         GGG
   [3]       15 [3000, 4000]      * |         Ex3        ADSA
   ---
   seqlengths:
    10 15  2
    NA NA NA
 > as.data.frame(t1)
   seqnames start  end width strand  Id Name
1        2   152  180    29      * Ex1  Alv
2       10  2000 2220   221      * Ex2  GGG
3       15  3000 4000  1001      * Ex3 ADSA


and many other sequence-related operations.

Hope that helps,

Martin

>
> --
> View this message in context: http://r.789695.n4.nabble.com/Table-Intersection-tp4306968p4306968.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the R-help mailing list