[BioC] Re(surrecting): [Bioc-sig-seq] Rsamtools countBam labeling

Cook, Malcolm MEC at stowers.org
Thu Mar 31 21:56:24 CEST 2011


!! Please pardon the line-breaks MS Outlook auto-inserted into my code in a previous version of this post.
 

Martin,

Hmmm, Thanks .... I think I'm getting it....

Following your lead, I can directly re-order cnt in the OriginalOrder as:

> cnt[sort(unlist(split(values(which2), seqnames(which2)))$OriginalOrder,index.return=TRUE)$ix,]
  space start  end width    file records nucleotides
2  seq2   100 1000   901 ex1.bam    1169       41235
1  seq1  1000 2000  1001 ex1.bam     612       21549
3  seq2  1000 2000  1001 ex1.bam     642       22640


... which can usefully(?) be abstracted as

countBamWhich <- function (file,which,index=file,...) {
### wrapper for countBam that reorders its results to aggree with
### 'which', a required ScanBamParam.  ... params are additional
### ScanBamParam options.
###
### ASSUMES: internal implementation detail of ScanBamParam. c.f.
### http://permalink.gmane.org/gmane.science.biology.informatics.conductor/34208)
  param <- ScanBamParam(which=which,...)
  values(which)[['OriginalOrder']] <- 1:length(which)
  CBW = countBam(file,index,param=ScanBamParam(which=which))
  CBW[sort(unlist(split(values(which),
                        seqnames(which)))$OriginalOrder,index.return=TRUE)$ix,]
}

allowing me to write

> countBamWhich(fl, which2)
  space start  end width    file records nucleotides
2  seq2   100 1000   901 ex1.bam    1169       41235
1  seq1  1000 2000  1001 ex1.bam     612       21549
3  seq2  1000 2000  1001 ex1.bam     642       22640

All in favor?

~ Malcolm




Malcolm Cook
Stowers Institute for Medical Research -  Bioinformatics
Kansas City, Missouri  USA
 
 

> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org] 
> Sent: Thursday, March 31, 2011 11:41 AM
> To: Cook, Malcolm
> Cc: 'myoung at wehi.EDU.AU'; 'bioconductor at r-project.org'; 
> 'Bioc-sig-sequencing at r-project.org'
> Subject: Re: [BioC] Re(surrecting): [Bioc-sig-seq] Rsamtools 
> countBam labeling
> 
> On 03/31/2011 08:32 AM, Cook, Malcolm wrote:
> > Matt et. al.,
> >
> > I wonder if a satisfactory resolution to the issue of "the order
> > changes between the GRanges object and the countBam data.frame
> >
> > 
> http://www.mail-archive.com/bioc-sig-sequencing@r-project.org/
> msg01144.html
> >
> >  I am presented with the same issue and poised to tackle it 
> but wondr
> > if a generic solution emerged from you inquiries&  efforts.
> 
> Hi Malcolm --
> 
> For a reproducible example,
> 
>    library(Rsamtools)
>    example(countBam)
>    which1 <- as(which, "GRanges")
>    ## which2 might be where your data actually starts
>    which2 <- which1[c(2,1,3)]
>    values(which2)[["OriginalOrder"]] <- 1:3
>    param <- ScanBamParam(which=which2)
>    cnt <- countBam(fl, param=param)
> 
> What happens is that ScanBamParam converts its argument to an 
> IRangesList, using split(ranges(which2), seqnames(which2)). So do the 
> same for the values and unlist
> 
>    cntVals <- unlist(split(values(which2), seqnames(which2)))
> 
> then cbind coerced values
> 
>    cbind(cnt, as.data.frame(cntVals))
> 
> with
> 
>  > which2
> GRanges with 3 ranges and 1 elementMetadata value
>      seqnames       ranges strand | OriginalOrder
>         <Rle>    <IRanges>  <Rle> |     <integer>
> [1]     seq2 [ 100, 1000]      * |             1
> [2]     seq1 [1000, 2000]      * |             2
> [3]     seq2 [1000, 2000]      * |             3
> 
> seqlengths
>   seq1 seq2
>     NA   NA
>  > cbind(cnt, as.data.frame(cntVals))
>    space start  end width    file records nucleotides OriginalOrder
> 1  seq1  1000 2000  1001 ex1.bam     612       21549             2
> 2  seq2   100 1000   901 ex1.bam    1169       41235             1
> 3  seq2  1000 2000  1001 ex1.bam     642       22640             3
> 
> Martin
> 
> >
> > Thanks,
> >
> > Malcolm Cook Stowers Institute for Medical Research -
> > Bioinformatics Kansas City, Missouri  USA
> >
> >
> > _______________________________________________ Bioconductor mailing
> > list Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> > archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 
> -- 
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
> 
> Location: M1-B861
> Telephone: 206 667-2793
> 


More information about the Bioconductor mailing list