[BioC] Transformation of Grange object to density per bin

Martin Morgan mtmorgan at fhcrc.org
Tue Jan 22 00:23:11 CET 2013


On 01/20/2013 12:45 PM, Hermann Norpois wrote:
> I am sorry, but I still have a problem. With your syntax I get a vector
> that is going over all chromosomes. How do I get , for instance separate
> vectors for each chromosomes?

I wrote Dario's original solution as:

Get seqlengths (probably these should have been added to testset.gr when it was 
created, e.g., coverage(BamFile("myBam"))

   library(BSgenome.Hsapiens.UCSC.hg19)
   seqlengths(testset.gr) <- seqlengths(Hsapiens)[seqlevels(testset.gr)]

Calculate coverage

   Gcoverage <- coverage(testset.gr)

Here's Dario's function, revised to take a width argument and for the first 
argument, 'x', to be the coverage of a single chromosome

   FUN <- function(x, y, width=200) {
       starts <- seq(1, y, width)
       mean(Views(x, start = starts, width=width), na.rm = TRUE)
   }

Then the calculation over all chromosomes

   coverageMeans <-
       unlist(Map(FUN, Gcoverage, seqlengths(testset.gr)), use.names=FALSE)

And then to get separate vectors for each chromosome, it's just a matter of not 
unlisting

   binsByChr <- Map(FUN, Gcoverage, seqlengths(testset.gr))

with, e.g.,

   binsByChr$chr1

But maybe I'm not understanding what you're after?

Martin

> Thanks
>
> 2013/1/17 Hermann Norpois <hnorpois at googlemail.com>
>
>> Hello,
>>
>> I would like to transform a GRange-object into something with density
>> information related of bins, e.g. 1 bin=200bps. So it starts at poistion 1
>> etc ...
>> That means (according to testset.gr):
>> bin 1    0
>> bin 2    0
>> ...
>> bin 50  0
>> bin 51    0.28        #10000-10200 => (200-144)/200
>> bin 52    0.765        #10200-10400 => 153/200
>> ...
>> etc
>> according to testset.gr (see below).
>>
>> How does it work? Is there a special function?
>> Thanks
>> Hermann
>>
>>
>>> testset.gr
>> GRanges with 20 ranges and 0 metadata columns:
>>         seqnames           ranges strand
>>            <Rle>        <IRanges>  <Rle>
>>     [1]     chr1 [ 10144,  10353]      *
>>     [2]     chr1 [ 10441,  10463]      *
>>     [3]     chr1 [235633, 235766]      *
>>     [4]     chr1 [237717, 237895]      *
>>     [5]     chr1 [521444, 521624]      *
>>     [6]     chr1 [540609, 540786]      *
>>     [7]     chr1 [564495, 564672]      *
>>     [8]     chr1 [565254, 566081]      *
>>     [9]     chr1 [566537, 567272]      *
>>     ...      ...              ...    ...
>>    [12]     chr1 [569610, 570190]      *
>>    [13]     chr1 [601057, 601196]      *
>>    [14]     chr1 [713227, 713475]      *
>>    [15]     chr1 [752533, 752578]      *
>>    [16]     chr1 [752654, 752695]      *
>>    [17]     chr1 [754405, 754539]      *
>>    [18]     chr1 [755336, 755918]      *
>>    [19]     chr1 [756027, 756458]      *
>>    [20]     chr1 [756793, 756843]      *
>>    ---
>>    seqlengths:
>>      chr1 chr10 chr11 chr12 chr13 chr14 ...  chr6  chr7  chr8  chr9  chrX
>> chrY
>>        NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA
>> NA    NA
>>> dput (testset.gr)
>> new("GRanges"
>>      , seqnames = new("Rle"
>>      , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12",
>> "chr13",
>> "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2",
>> "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7",
>> "chr8", "chr9", "chrX", "chrY"), class = "factor")
>>      , lengths = 20L
>>      , elementMetadata = NULL
>>      , metadata = list()
>> )
>>      , ranges = new("IRanges"
>>      , start = c(10144L, 10441L, 235633L, 237717L, 521444L, 540609L,
>> 564495L,
>> 565254L, 566537L, 567450L, 568038L, 569610L, 601057L, 713227L,
>> 752533L, 752654L, 754405L, 755336L, 756027L, 756793L)
>>      , width = c(210L, 23L, 134L, 179L, 181L, 178L, 178L, 828L, 736L, 453L,
>> 545L, 581L, 140L, 249L, 46L, 42L, 135L, 583L, 432L, 51L)
>>      , NAMES = NULL
>>      , elementType = "integer"
>>      , elementMetadata = NULL
>>      , metadata = list()
>> )
>>      , strand = new("Rle"
>>      , values = structure(3L, .Label = c("+", "-", "*"), class = "factor")
>>      , lengths = 20L
>>      , elementMetadata = NULL
>>      , metadata = list()
>> )
>>      , elementMetadata = new("DataFrame"
>>      , rownames = NULL
>>      , nrows = 20L
>>      , listData = structure(list(), .Names = character(0))
>>      , elementType = "ANY"
>>      , elementMetadata = NULL
>>      , metadata = list()
>> )
>>      , seqinfo = new("Seqinfo"
>>      , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14",
>> "chr15",
>> "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21",
>> "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9",
>> "chrX", "chrY")
>>      , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_,
>> NA_integer_, NA_integer_, NA_integer_, NA_integer_)
>>      , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> NA, NA,
>> NA, NA, NA, NA, NA, NA, NA, NA, NA)
>>      , genome = c(NA_character_, NA_character_, NA_character_,
>> NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
>> NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
>> )
>> )
>>      , metadata = list()
>> )
>>>
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list