[R] comparing two tables

David Winsemius dwinsemius at comcast.net
Tue Oct 25 15:27:47 CEST 2011


On Oct 25, 2011, at 6:42 AM, Assa Yeroslaviz wrote:

> Hi everybody,
>
> I would like to know whether it is possible to compare to tables for  
> certain
> parameters.
> I have these two tables:
> gene table
> name     chr     start     end     str     accession     Length
> gen1     4     646752     646838     +     MI0005806     86
> gen12     2L     243035     243141     -     MI0005821     106
> gen3     2L     159838     159928     +     MI0005813     90
> gen7     2L     1831685     1831799     -     MI0011290     114
> gen4     2L     2737568     2737661     +     MI0017696     93
> ...
>
> localization table:
> Chr     Start     End     length
> 4     136532     138654     2122
> 3     139870     141970     2100
> 2L     157838     158440     602
> X     160834     162966     2132
> 4     204040     208536     4496
> ...
>
> I would like to check whether a specific gene lie within a certain  
> region.
> For example I want to see if gene 3 on chromosome 2L lies within the  
> region
> given in the second table.
>

rd.txt <- function(txt, header=TRUE, ...) {
      rd <- read.table(textConnection(txt), header=header, ...)
        closeAllConnections()
      rd }
# Data input
  genetable <- rd.txt("name     chr     start     end     str      
accession     Length
  gen1     4     646752     646838     +     MI0005806     86
  gen12     2L     243035     243141     -     MI0005821     106
  gen3     2L     159838     159928     +     MI0005813     90
  gen7     2L     1831685     1831799     -     MI0011290     114
  gen4     2L     2737568     2737661     +     MI0017696     93")
  loctable <- rd.txt("Chr     Start     End     length
  4     136532     138654     2122
  3     139870     141970     2100
  2L     157838     158440     602
  X     160834     162966     2132
  4     204040     208536     4496")

# Helper function
  inregion <- function(vec, locs) {
         any( apply(locs, 1, function(x) vec["start"]>x[1] &  
vec["end"]<=x[2])) }
# Test the function
  inregion(genetable[2, ], loctable[, c("Start", "End")])
# [1] FALSE

  apply(genetable, 1, function(x) inregion(x, loctable[, c("Start",  
"End")]) )
#[1] FALSE FALSE FALSE FALSE FALSE

The logical vector can be used to extract elements from genetable, but  
seems pointless to offer code that produces an empty dataframe.

(Wouldn't it have been more sensible to offer a test case that had a  
combination that satisfied you requirements?)

I'm guessing that this facility would already be implemented in one or  
more  BioConductor functions.

-- 
David.

> What I would like to is like
> 1. check if the gene lies on a specific chromosome
> 1.a if no - go to the next line
> 1.b if yes - go to 2
> 2. check if the start position of the gene is bigger than the start  
> position
> of the localization table AND if it smaller than the end position  
> (if it
> lies between the start and end positions in the localization table)
> 2.a if no - go to the next gene
> 2.b if yes - give it to me.
>
> I was having difficulties doing it without running into three  
> interleaved
> conditional loops (if).
>
> I would appreciate any help.
>
> Thanks
>
> Assa
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list