[BioC] How to convert from IRanges(List) to Rle(List)

Cook, Malcolm MEC at stowers.org
Mon Apr 9 19:34:47 CEST 2012


 
> On Mon, Apr 9, 2012 at 8:42 AM, Valerie Obenchain
> <vobencha at fhcrc.org>wrote:
> 
> > **
> > On 04/07/2012 10:45 PM, Michael Lawrence wrote:
> >
> >
> >
> > On Sat, Apr 7, 2012 at 7:31 PM, Valerie Obenchain
> <vobencha at fhcrc.org>wrote:
> >
> >> On 04/07/12 16:30, Michael Lawrence wrote:
> >>
> >>>  On Sat, Apr 7, 2012 at 11:12 AM, Martin Morgan<mtmorgan at fhcrc.org>
> >>>  wrote:
> >>>
> >>>   On 04/07/2012 05:39 AM, Nicolas Delhomme wrote:
> >>>>
> >>>>   Hi all,
> >>>>>
> >>>>> I'm just wondering if there would be a direct way to convert an
> >>>>> IRanges to an Rle, as in: as(rng,"Rle"). At the moment, I can convert
> >>>>> my IRanges into an integer vector and cast that as an Rle
> >>>>> (Rle(as.integer(rng)), but that is not extremely efficient on a long
> >>>>> IRangesList (with>   700,000 IRanges in it). Takes ~10 mins with an
> >>>>> sapply.
> >>>>>
> >>>>> Why I want that is for the following: I have an IRangesList of
> >>>>> transcripts (describing exons at the genome level) and for every one,
> >>>>> I have a bp position at the transcript level that I want to convert
> >>>>> into a genomic bp position. Basically, I need to be able to convert a
> >>>>> given transcript coordinate into the corresponding genomic
> >>>>> coordinate. My IRanges contain the genomic coordinates of every
> >>>>> transcript and by converting it into an integer vector, I can select
> >>>>> the right genomic bp coordinate by using the transcript bp coordinate
> >>>>>  as an index (as.integer(rng)[transcript.**pos]).
> >>>>>
> >>>>>
> >>>>> I considered the IRanges approach because I keep the transcript
> name
> >>>>> and I'm sure that I looking up the right coord in the right
> >>>>> transcript, but I'm open to other suggestions.
> >>>>>
> >>>>>  Hi Nico -- VariantAnnotation::**refLocsToLocalLocs,
> >>>> GenomicFeatures::**transcriptLocs2refLocs
> >>>>
> >>>> and IRanges::map might do this for you; no direct experience on my
> part,
> >>>> though. Martin
> >>>>
> >>>>
> >>>>   Right. Right now, IRanges::map will take things from global to local
> >>> (either into transcripts or reads, depending on the argument). This takes
> >>> the place of "refLocsToLocalLocs". What "map" needs to support is the
> >>> reverse. I think we could do this with either a new function. I am not
> >>> sure
> >>> if it should be called reverseMap though, because it's not clear which is
> >>> forward and which is reverse. Maybe we need mapToGlobal and
> mapToLocal?
> >>> Or
> >>> maybe "absolute" and "relative" are better terms?
> >>>
> >>> Btw, we are working on an "easier to use" interface for the
> >>> transcriptLocsToRefLocs function and that should be integrated with any
> >>> refactoring/renaming.
> >>>
> >> I like the idea of the map generic and where it is going. I think the
> >> mapToGlobal and mapToLocal terms are more clear. Assuming in
> mapToGlobal
> >> the 'from' would be along the lines of cDNA-based, cds-based, or
> >> protein-based coordinates. In mapToLocal the 'from' would always be
> >> genomic-based coordinates. Yes?
> >>
> >>
> > Yes, that would be the typical use case, although the generic is meant to
> > be more general, i.e., it is in IRanges, not GenomicRanges.
> >
> >
> > OK. You previously mentioned the map generic could be used to both
> convert
> > between organisms (human reference-based  -> pig reference-based ) and
> > between coordinate systems within the same organism (human reference-
> based
> > -> human cds-based). At least I think that's what you had in mind.  If this
> > is the case, maybe we need an argument that indicates 'sameOrganism =
> TRUE'?
> >
> >
> >
> I think it would all depend on the alignment that is provided. We could
> have a Chain method

maybe Reduce ?

--Malcolm

> that could go between assemblies, including between
> species (even though it is not really condoned). The GRangesList method is
> always going to assume that each element represents the alignment of some
> sequence (like a refseq) to the same genome build as "from". The
> compatibility of the genome builds is an easy thing to check, given Seqinfo.
> 
> 



More information about the Bioconductor mailing list