[BioC] reducing RangedData

Robert Castelo robert.castelo at upf.edu
Thu Oct 22 18:31:58 CEST 2009


hi Patrick,

thanks a lot, the concise code snippet below is what i was looking for,
i'd had written at least twice as much to do the same job :)

regarding whether it would be nice to have a reduction operation with a
signature that includes other stuff i think that for the particular case
of the strand it is definitely useful to project (reduce in RangedData
terminology) genomic coordinates on genomic space taking it into account
because many of the functional elements in the genome are stranded and
projecting doesn't imply always that one is not interested in the strand
of the projected intervals.

with respect to whether other information contained in additional
columns should somehow be "reduced" i think it would suffice to enable
the reduction of the "names" of a RangedData (taking, for instance,
arbitrarily one name between two overlapping ranges) because those names
would allow to go back to the original (un-reduced) data and pull
whatever we're interested in (actually, including the strand).

my two-cents about this, and thanks again for your quick help.

robert.

On Thu, 2009-10-22 at 00:58 -0700, Patrick Aboyoun wrote:
> Robert,
> I wrote you some code that you can use below. I've seen other reduction 
> operations where the user wanted to reduce a RangedData type object by 
> more than one column (e.g. strand and score), so perhaps an appropriate 
> signature for a reduce method for RangedData would be
> 
> reduce(x, by, ...)
> 
> If this method where written, the remaining question would be should it 
> support reducing transformations of the remaining values columns or 
> should non "by" columns be dropped from the output. I'm inclined towards 
> the latter, since a reduce(x, by, valuesFUN) method wouldn't be much 
> simpler than the rdapply method given below. Would a straight-forward 
> reduce(x = rd, by = "strand") approach for RangedData meet your needs?
> 
> 
>  > suppressMessages(library(IRanges))
>  > sta <- c(1, 2, 5, 6, 2, 3, 4, 5)
>  > end <- c(3, 4, 5, 7, 2, 4, 6, 7)
>  > str <- rep(c("+", "+", "-", "-"), 2)
>  > chr <- rep(c("chr1", "chr2"), each = 4)
>  > rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr)
>  > rd
> RangedData: 8 ranges by 1 columns
> columns(1): strand
> sequences(2): chr1 chr2
>  >
>  > # Interesting bit starts here
>  > reduceStranded <- function(x) {
> +     strand <- x$strand
> +     rngs <- ranges(x)[[1]]
> +     ans <- IRangesList("+" = reduce(rngs[strand == "+"]),
> +                        "-" = reduce(rngs[strand == "-"]))
> +     RangedData(unlist(ans, use.names = FALSE),
> +                strand = rep(c("+", "-"), elementLengths(ans)),
>                  space = names(x))
> + } 
>  > params <- RDApplyParams(rd, reduceStranded, reducerFun = function(x) 
> do.call(c, unname(x)))
>  > rdapply(params)
> RangedData: 4 ranges by 1 columns
> columns(1): strand
> sequences(2): chr1 chr2
>  > as.data.frame(rdapply(params))
>   space start end width strand
> 1  chr1     1   4     4      +
> 2  chr1     5   7     3      -
> 3  chr2     2   4     3      +
> 4  chr2     4   7     4      -
>  > sessionInfo()
> R version 2.9.2 Patched (2009-08-24 r49420)
> i386-apple-darwin9.8.0
> 
> locale:
> C/en_US.UTF-8/C/C/C/C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base    
> 
> other attached packages:
> [1] IRanges_1.2.3
> 
> 
> 
> 
> Robert Castelo wrote:
> > dear list,
> >
> > i'd like to project a set of exons on genomic space but keeping
> > the strand information. let's assume for simplicity that no exons
> > overlap in opposite strands so no conflicts need to be resolved
> > regarding that case. in principle this is easy without keeping the
> > strand using a RangeList and the reduce method. however i've been
> > struggling for a while without success to figure out how can i
> > "reduce" my coordinates without loosing the strand information.
> >
> > my guess is that to carry out the strand information i need to
> > use a RangedData object:
> >
> > sta <- c(1, 1, 1)
> > end <- c(3, 4, 5)
> > str <- c("+", "+", "-")
> > rd <- RangedData(IRanges(start=sta, end=end), strand=str, space=chr)
> > rd
> > RangedData: 3 ranges by 1 columns
> > columns(1): strand
> > sequences(2): chr1 chr2
> >
> > and then use rdapply to reduce on each of the chromosomes but i
> > either don't know how to directly apply reduce:
> >
> > params <- RDApplyParams(rd, reduce)
> > rdapply(params)
> > Error in function (classes, fdef, mtable)  :
> >   unable to find an inherited method for function "reduce", for 
> > signature "RangedData"
> >
> >
> > or i loose the strand information:
> >
> > params <- RDApplyParams(rd, function(rd) reduce(ranges(rd)))
> > rdapply(params)
> > $chr1
> > RangesList: 1 range
> > 1: 1:4
> >
> > $chr2
> > RangesList: 1 range
> > 1: 1:5
> >
> > thanks for your help!
> > robert.
> >
> >
> >
> > sessionInfo()
> > R version 2.9.1 (2009-06-26)
> > i386-apple-darwin8.11.1
> >
> > locale:
> > en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> > other attached packages:
> >  [1] rtracklayer_1.4.1                     RCurl_0.98-1
> >  [3] bitops_1.0-4.1                        GOstats_2.10.0
> >  [5] graph_1.22.2                          Category_2.10.1
> >  [7] geneplotter_1.22.0                    lattice_0.17-25
> >  [9] limma_2.18.2                          bgSpJncArrayDm1.db_1.0.1
> > [11] org.Dm.eg.db_2.2.11                   RSQLite_0.7-1
> > [13] DBI_0.2-4                             
> > BSgenome.Dmelanogaster.UCSC.dm1_1.0.0
> > [15] BSgenome_1.12.3                       RColorBrewer_1.0-2
> > [17] Biostrings_2.12.8                     IRanges_1.2.3
> > [19] annotate_1.22.0                       AnnotationDbi_1.6.1
> > [21] Biobase_2.4.1                         xtable_1.5-5
> >
> > loaded via a namespace (and not attached):
> > [1] genefilter_1.24.2 GO.db_2.2.11      grid_2.9.1        GSEABase_1.6.1
> > [5] RBGL_1.20.0       splines_2.9.1     survival_2.35-4   tools_2.9.1
> > [9] XML_2.5-3
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: 
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
>



More information about the Bioconductor mailing list