[BioC] Approaches to building Rle sequences from ranges

Johnston, Jeffrey jjj at stowers.org
Fri Jun 6 22:11:58 CEST 2014


For GRanges with a metadata column, you can do:

coverage(granges, weight=“variable”)

This will produce an RleList where the value at each coordinate is the sum of the “variable” metadata column of all overlapping ranges. I think this will work for your use case if your ranges do not overlap.

-Jeff

On Jun 6, 2014, at 2:59 PM, Vince S. Buffalo <vsbuffalo at ucdavis.edu> wrote:

> Hi All,
> 
> I'm thinking there might be a clever way to do something that I'm not aware
> of. The setup is that I frequently find myself using using Views and
> viewMeans, viewSums, etc. to calculate summary statistics by tiles on
> sequences. I have a GRanges object with 1-width ranges (but this should
> apply more generally), and metadata columns have some measurement (GC
> content, pairwise diversity, some quality metric, etc). I need to go from a
> quantitative variable tied to specific ranges to an Rle sequence across an
> entire chromosome to use Views/viewMeans (e.g. the binnedAverages example
> in the How to) . Right now I approach this with something like:
> 
> data <- Rle(NA, length=seqlengths(txdb)['chr1'])
> data[start(my_rngs)] <- my_rngs$variable # simple, since my features are
> are 1-width
> 
> # or more generally:
> data2 <- Rle(NA, length=seqlengths(txdb)['chr1'])
> data2[ranges(my_rngs)] <- my_rngs$variable
> identical(as.vector(data), as.vector(data2)) # returns TRUE (contingent on
> all widths = 1)
> 
> Then, I can convert these to Views on a set of bins/tiles created with
> tileGenome, and use the viewMean, viewSums, etc. functions (removing NAs).
> 
> So my question is — are there better methods for creating sequence-length
> Rle from metadata columns? Or another way of saying this is taking some
> metadata column corresponding to ranges and mapping it to coordinate space
> (maybe in one call)? It seems like if seqlengths is set in the GRanges
> object, there's sufficient information to go directly from a GRanges
> metadata column to an Rle vector (and I might be missing a more obvious
> solution). My example assumed a single chromosome, but an approach that
> knows to handle multiple sequences through RleLists seems like it would be
> helpful.
> 
> thanks,
> Vince
> 
> PS: My apologies if you've received this message twice, I had to resend
> after it appears that I sent it to the wrong list.
> 
> -- 
> Vince Buffalo
> Ross-Ibarra Lab (www.rilab.org)
> Plant Sciences, UC Davis
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list