[BioC] annotating reads acording with position on mapping

Joern Toedling Joern.Toedling at curie.fr
Fri May 14 17:31:22 CEST 2010


Hello,

I leave it to the IRanges developers to point out the quickest way how to find
such overlaps using IRanges, but my guess is that you need to create
'RangedData' objects and use the function findOverlaps then.

However, sorry for the shameless plug, the package 'girafe' from the latest
Bioconductor release can also be used to answer such kinds of questions. Have
a look at the vignette for some use cases. Basically you need to create two
objects:
1. an object of class 'AlignedGenomeIntervals' from your aligned sequences.
the manual page of that class and the vignette show how to do this, but it's
easy given the data.frame that you already have when you read your table into
R using read.table.
2. an object of class 'Genome_intervals_stranded' of your genomic annotation.
For example, the function 'readGff3' from package 'genomeIntervals' can be
used to create such an object from a gff (version 3) file containing such
annotation.
When you have those two objects, the function 'interval_overlap' will give you
overlaps of any kind (>= 1nt) between those two, and 'fracOverlap' can be used
to get overlaps based on additional restrictions that you specify.
How to use 'girafe' for finding overlaps is also shown in the vignette.
And there is also a coercion method between AlignedGenomeIntervals objects and
RangedData for using IRanges methods later on.

Hope that helps,
Joern

PS: There is an additional mailing list 'bioc-sig-sequencing' which may be
more appropriate for this kind of question.

On Fri, 14 May 2010 13:43:15 +0100, Andreia Fonseca wrote
> Dear List,
> 
> I have a file with the hits of my sequences of small RNA (18-30bp) 
> in the human genome and I have downloaded the all the annotation of 
> the human genome from UCSC. What I want is to annotate my sequences 
> by finding ovelaping between the positions of my sequences the the information
> available from the tables I have downloaded from UCSC. So in the 
> file which maps my sequences (produced using microRazers) in the 
> human genome I have the folowing structure:
> 
> sequence sequence length strand chromosome start end score alignment 
> length
> 
> I don't want to do this with biomart, because it will be too slow 
> making all the queries. However I have found the package IRanges,
>  which has the overlap function, but I am not understanding how the 
> two tables - the query and the target tables - should be stored and 
> how to make the overlapping. Can someone give me a hint? With kind 
> regards, Andreia
> 
> -- 
> --------------------------------------------
> Andreia J. Amaral
> Unidade de Imunologia Clínica
> Instituto de Medicina Molecular
> Universidade de Lisboa
> email: andreiaamaral at fm.ul.pt
>          andreia.fonseca at gmail.com
> 
> 	[[alternative HTML version deleted]]


---
Joern Toedling
Institut Curie -- U900
26 rue d'Ulm, 75005 Paris, FRANCE
Tel. +33 (0)156246927



More information about the Bioconductor mailing list