[R] how to search to value to another table

Mon Jan 31 20:53:40 CET 2011

On Mon, Jan 31, 2011 at 05:35:35PM +0100, Mauluda Akhtar wrote:
> Hello,
> 
> I'm a new R user.
> 
> I have two different dummy tables with the variable name tb1 and tb2.

Hello.

First, let me put your data into an R command using dput().

  tb1 <-
  structure(list(V1 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
  2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("chr1", 
  "chr2", "chr3"), class = "factor"), V2 = c(22L, 36L, 54L, 77L, 
  80L, 85L, 99L, 105L, 120L, 130L, 140L, 150L, 172L, 177L, 190L, 
  200L, 220L, 300L, 310L), V3 = c(23L, 37L, 55L, 78L, 81L, 86L, 
  100L, 106L, 121L, 131L, 141L, 151L, 173L, 178L, 191L, 201L, 221L, 
  301L, 311L), V4 = c(3L, 1L, 0L, 1L, 4L, 0L, 1L, 0L, 1L, 1L, 0L,
  5L, 0L, 1L, 6L, 8L, 0L, 9L, 10L)), .Names = c("V1", "V2", "V3",
  "V4"), class = "data.frame", row.names = c(NA, -19L))

  tb2 <-
  structure(list(V1 = structure(c(1L, 1L, 2L, 2L, 3L), .Label = c("chr1",
  "chr2", "chr3"), class = "factor"), V2 = c(20L, 70L, 90L, 130L, 
  190L), V3 = c(40L, 80L, 110L, 140L, 230L)), .Names = c("V1", 
  "V2", "V3"), class = "data.frame", row.names = c(NA, -5L))

> 
>  The first column is the common field of the both tables. In the first table
> column v3 is always v2+1 while in the second table v2 and v3 hold the range.
> I want to know which rows of tb1 intercept with the range between v2 and v3
> of tb3. I tried but I failed to solve the problem.

I am not sure, whether i understand correctly, what you want. If the
intervals in tb2 have different names, then i would expect that the
values in tb1 should be compared to the row in tb2 with the same name.
However, the names in tb2 are not unique. Can you provide an example
of the required output?

Is the following close to what you expect?

  tb1$index <- 0
  for (i in seq.int(length=nrow(tb2))) {
      tb1$index[tb2$V2[i] <= tb1$V2 & tb1$V3 <= tb2$V3[i]] <- i
  }

  tb1

       V1  V2  V3 V4 index
  1  chr1  22  23  3     1
  2  chr1  36  37  1     1
  3  chr1  54  55  0     0
  4  chr1  77  78  1     2
  5  chr2  80  81  4     0
  6  chr2  85  86  0     0
  7  chr2  99 100  1     3
  8  chr2 105 106  0     3
  9  chr2 120 121  1     0
  10 chr2 130 131  1     4
  11 chr2 140 141  0     0
  12 chr2 150 151  5     0
  13 chr3 172 173  0     0
  14 chr3 177 178  1     0
  15 chr3 190 191  6     5
  16 chr3 200 201  8     5
  17 chr3 220 221  0     5
  18 chr3 300 301  9     0
  19 chr3 310 311 10     0

  tb2

      V1  V2  V3
  1 chr1  20  40
  2 chr1  70  80
  3 chr2  90 110
  4 chr2 130 140
  5 chr3 190 230

The column tb1$index contains for each row the index of the interval [V2, V3]
in tb2, which contains the values V2, V3 from tb1. For example, line 

  10 chr2 130 131  1     4

of tb1 contains index 4, because the interval

  4 chr2 130 140

in tb2 contains numbers 130 and 131.

Index 0 means that no interval in tb2 contains both the numbers V2, V3 in the
given row of tb1.

Hope this helps.

Petr Savicky.