[BioC] comparing two tables

Martin Morgan mtmorgan at fhcrc.org
Tue Oct 25 15:48:26 CEST 2011


On 10/25/2011 03:42 AM, Assa Yeroslaviz wrote:
> Hi everybody,
>
> I would like to know whether it is possible to compare to tables for certain
> parameters.
> I have these two tables:
> gene table
> name     chr     start     end     str     accession     Length
> gen1     4     646752     646838     +     MI0005806     86
> gen12     2L     243035     243141     -     MI0005821     106
> gen3     2L     159838     159928     +     MI0005813     90
> gen7     2L     1831685     1831799     -     MI0011290     114
> gen4     2L     2737568     2737661     +     MI0017696     93
> ...
>
> localization table:
> Chr     Start     End     length
> 4     136532     138654     2122
> 3     139870     141970     2100
> 2L     157838     158440     602
> X     160834     162966     2132
> 4     204040     208536     4496
> ...
>
> I would like to check whether a specific gene lie within a certain region.
> For example I want to see if gene 3 on chromosome 2L lies within the region
> given in the second table.

Hi Assa --

In Bioconductor, use the GenomicRanges package. Create two GRanges objects

   genes = with(genetable, GRanges(chr, IRanges(start, end), str,
                                   accession=accession, Length=length)
   locations = with(locationtable, GRanges(Chr, IRanges(Start, End)))

then

   olaps = findOverlaps(genes, locations)

queryHits(olaps) and subjectHits(olaps) index each gene with all 
locations it overlaps. The definition of 'overlap' is flexible, see 
?findOverlaps.

Martin


>
> What I would like to is like
> 1. check if the gene lies on a specific chromosome
> 1.a if no - go to the next line
> 1.b if yes - go to 2
> 2. check if the start position of the gene is bigger than the start position
> of the localization table AND if it smaller than the end position (if it
> lies between the start and end positions in the localization table)
> 2.a if no - go to the next gene
> 2.b if yes - give it to me.
>
> I was having difficulties doing it without running into three interleaved
> conditional loops (if).
>
> I would appreciate any help.
>
> Thanks
>
> Assa
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list