[BioC] genome_intervals and GRange objects/HiTC package

Nicolas Servant nservant at curie.fr
Thu Feb 7 16:08:56 CET 2013


Dear all,

Yes sorry for that, i didn't check the BioC list since a while.
Indead I made a lot of change in the HiTC package in devel branch.

Among the major changes, the package is now based on Genomic Ranges
instead of genomeIntervals.
The main reason of this change is an increase of the inter-operablility
of this package with the rest of the BioConductor project.
Since the 13.2 version, I now also distinguish the HTCexp objects,
basically a single interaction map, from the HTClist (a list of HTCexp)
object, basically useful for Hi-C data.

I often have some direct feedbacks from users, and according to their
needs, the package evolves quite rapidly. So please try to use the devel
version (v1.3.2, but the 1.3.3 should arrived soon) and do not hesitate
to contact if you have any question.

Nicolas



Le 07/02/2013 15:26, Valerie Obenchain a écrit :
> Herman,
>
> You must be using the release version of HiTC. Providing the output of
> sessionInfo() is also helpful when posting a question so others can
> see what versions you are using. I've put mine done so at the bottom
> of this post.
>
> The Genome_intervals class is present in the release version (1.2.0)
> but not in devel (1.3.2). The author must have decided to use GRanges
> objects instead (I've cc'd them on this email). In the mean time, here
> is an example of how to convert a 'Genome_intervals' to a 'GRanges'.
>
> Using the release version, there is an example of how to create object
> 'i' at the bottom of this man page,
>
> ?'Genome_intervals-class'
>
> > i
> Object of class Genome_intervals
> 2 base intervals and 2 inter-base intervals(*):
> chr01 [1, 2)
> chr01 [3, 5)
> chr02 [4, 6] *
> chr02 [8, 9) *
>
> annotation:
>   seq_name inter_base
> 1    chr01      FALSE
> 2    chr01      FALSE
> 3    chr02       TRUE
> 4    chr02       TRUE
>
> You can create a GRanges from 'i' with the following code,
>
> > GRanges(annotation(i)$seq_name, IRanges(i at .Data[,1], i at .Data[,2]))
> GRanges with 4 ranges and 0 elementMetadata cols:
>       seqnames    ranges strand
> <Rle> <IRanges> <Rle>
>   [1]    chr01    [1, 2]      *
>   [2]    chr01    [3, 5]      *
>   [3]    chr02    [4, 6]      *
>   [4]    chr02    [8, 9]      *
>   ---
>   seqlengths:
>    chr01 chr02
>       NA    NA
>
> > gr <- GRanges(annotation(i)$seq_name, IRanges(i at .Data[,1],
> i at .Data[,2]))
>
> You can add metadata with values() or elementMetadata(). These two
> functions have been replaced with mcols() in the devel branch.
> > values(gr) <- DataFrame("inter_base"=annotation(i)$inter_base)
> > gr
> GRanges with 4 ranges and 1 elementMetadata col:
>       seqnames    ranges strand | inter_base
> <Rle> <IRanges> <Rle> | <logical>
>   [1]    chr01    [1, 2]      * |      FALSE
>   [2]    chr01    [3, 5]      * |      FALSE
>   [3]    chr02    [4, 6]      * |       TRUE
>   [4]    chr02    [8, 9]      * |       TRUE
>   ---
>   seqlengths:
>    chr01 chr02
>       NA    NA
>
>
> Valerie
>
> > sessionInfo()
> ...
>
> other attached packages:
>  [1] HiTC_1.2.0             girafe_1.8.0           genomeIntervals_1.12.0
>  [4] intervals_0.13.3       ShortRead_1.14.4       latticeExtra_0.6-24
>  [7] RColorBrewer_1.0-5     lattice_0.20-10        Rsamtools_1.8.6
> [10] Biostrings_2.24.1      GenomicRanges_1.8.13   IRanges_1.14.4
> [13] BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] BSgenome_1.24.0 Biobase_2.16.0  bitops_1.0-5    hwriter_1.3
> [5] stats4_2.15.1   tools_2.15.1    zlibbioc_1.2.0
>
>
> On 02/06/13 09:13, Valerie Obenchain wrote:
>> Hi Herman,
>>
>> When you post a question please provide code that reproduces the
>> problem. Your example does not show how 'hic.gr' was created or what
>> kind of object it is.
>>
>> library(HiTC)
>> data(Nora_5C)
>> intervals <- y_intervals(E14sub)
>>
>> The 'intervals' object is a GRanges.
>>
>> > class(intervals)
>> [1] "GRanges"
>> attr(,"package")
>> [1] "GenomicRanges"
>>
>> You can extract metadata colums with 'mcols'.
>>
>> head(mcols(intervals))
>>
>> > head(mcols(intervals))
>> DataFrame with 6 rows and 4 columns
>>          name     score     itemRgb                thick
>> <character> <numeric> <character> <IRanges>
>> 1       FOR_3         0     #0000FF [98834146, 98837506]
>> 2       FOR_5         0     #0000FF [98840772, 98841227]
>> 3       FOR_7         0     #0000FF [98843249, 98848364]
>> 4       FOR_9         0     #0000FF [98849664, 98850577]
>> 5      FOR_13         0     #0000FF [98862022, 98867230]
>> 6      FOR_15         0     #0000FF [98869143, 98870524]
>>
>> Also use 'mcols' to set the value of a metadata column.
>>
>> intervals$newData <- seq_along(intervals)
>> > head(mcols(intervals))
>> DataFrame with 6 rows and 5 columns
>>          name     score     itemRgb                thick   newData
>> <character> <numeric> <character> <IRanges> <integer>
>> 1       FOR_3         0     #0000FF [98834146, 98837506]         1
>> 2       FOR_5         0     #0000FF [98840772, 98841227]         2
>> 3       FOR_7         0     #0000FF [98843249, 98848364]         3
>> 4       FOR_9         0     #0000FF [98849664, 98850577]         4
>> 5      FOR_13         0     #0000FF [98862022, 98867230]         5
>> 6      FOR_15         0     #0000FF [98869143, 98870524]         6
>>
>>
>> It looks like y_intervals() is returning a valid GRanges. Can you
>> provide an example of how you are getting a 'Genome_interval' object?
>>
>> Valerie
>>
>>
>>
>> On 02/05/2013 01:05 PM, Hermann Norpois wrote:
>>> Hello,
>>>
>>> in the documentation of HiTC package y_intervals () is described as a
>>> method to "return the ygi GRanges object defining the y intervals".
>>> I tried
>>> this for the test data (see dput) and expected a "classical" GRange
>>> object.
>>> For instance I would like to do an operation like
>>>
>>> mcols (hic.gr)$test<- 1
>>>
>>> But it did not work as hic.gr is a Genome_interval object (as dput
>>> mentioned). Can I transform this in a classical GR-object allowing
>>> mcols-operations? Can anybody comment on the difference between
>>> Genome_interval an Grange?
>>>
>>>
>>> Thanks
>>> Hermann
>>>
>>>
>>>
>>>> hic.gr
>>> Object of class Genome_intervals
>>> 5 base intervals and 0 inter-base intervals(*):
>>> chr14 [1, 999999]
>>> chr14 [1e+06, 1999999]
>>> chr14 [2e+06, 2999999]
>>> chr14 [3e+06, 3999999]
>>> chr14 [4e+06, 4999999]
>>>
>>> annotation:
>>>    seq_name       id inter_base
>>> 1    chr14 HIC_bin1      FALSE
>>> 2    chr14 HIC_bin2      FALSE
>>> 3    chr14 HIC_bin3      FALSE
>>> 4    chr14 HIC_bin4      FALSE
>>> 5    chr14 HIC_bin5      FALSE
>>>
>>>> dput (hic.gr)
>>> new("Genome_intervals"
>>>      , .Data = structure(c(1, 1e+06, 2e+06, 3e+06, 4e+06, 999999,
>>> 1999999,
>>> 2999999,
>>> 3999999, 4999999), .Dim = c(5L, 2L))
>>>      , annotation = structure(list(seq_name = structure(c(1L, 1L,
>>> 1L, 1L,
>>> 1L), .Label = "chr14", class = "factor"),
>>>      id = structure(c(1L, 20L, 31L, 42L, 53L), .Label = c("HIC_bin1",
>>>      "HIC_bin10", "HIC_bin100", "HIC_bin101", "HIC_bin102",
>>> "HIC_bin103",
>>>      "HIC_bin104", "HIC_bin105", "HIC_bin106", "HIC_bin107",
>>> "HIC_bin11",
>>>      "HIC_bin12", "HIC_bin13", "HIC_bin14", "HIC_bin15", "HIC_bin16",
>>>      "HIC_bin17", "HIC_bin18", "HIC_bin19", "HIC_bin2", "HIC_bin20",
>>>      "HIC_bin21", "HIC_bin22", "HIC_bin23", "HIC_bin24", "HIC_bin25",
>>>      "HIC_bin26", "HIC_bin27", "HIC_bin28", "HIC_bin29", "HIC_bin3",
>>>      "HIC_bin30", "HIC_bin31", "HIC_bin32", "HIC_bin33", "HIC_bin34",
>>>      "HIC_bin35", "HIC_bin36", "HIC_bin37", "HIC_bin38", "HIC_bin39",
>>>      "HIC_bin4", "HIC_bin40", "HIC_bin41", "HIC_bin42", "HIC_bin43",
>>>      "HIC_bin44", "HIC_bin45", "HIC_bin46", "HIC_bin47", "HIC_bin48",
>>>      "HIC_bin49", "HIC_bin5", "HIC_bin50", "HIC_bin51", "HIC_bin52",
>>>      "HIC_bin53", "HIC_bin54", "HIC_bin55", "HIC_bin56", "HIC_bin57",
>>>      "HIC_bin58", "HIC_bin59", "HIC_bin6", "HIC_bin60", "HIC_bin61",
>>>      "HIC_bin62", "HIC_bin63", "HIC_bin64", "HIC_bin65", "HIC_bin66",
>>>      "HIC_bin67", "HIC_bin68", "HIC_bin69", "HIC_bin7", "HIC_bin70",
>>>      "HIC_bin71", "HIC_bin72", "HIC_bin73", "HIC_bin74", "HIC_bin75",
>>>      "HIC_bin76", "HIC_bin77", "HIC_bin78", "HIC_bin79", "HIC_bin8",
>>>      "HIC_bin80", "HIC_bin81", "HIC_bin82", "HIC_bin83", "HIC_bin84",
>>>      "HIC_bin85", "HIC_bin86", "HIC_bin87", "HIC_bin88", "HIC_bin89",
>>>      "HIC_bin9", "HIC_bin90", "HIC_bin91", "HIC_bin92", "HIC_bin93",
>>>      "HIC_bin94", "HIC_bin95", "HIC_bin96", "HIC_bin97", "HIC_bin98",
>>>      "HIC_bin99"), class = "factor"), inter_base = c(FALSE, FALSE,
>>>      FALSE, FALSE, FALSE)), .Names = c("seq_name", "id", "inter_base"
>>> ), row.names = c(NA, 5L), class = "data.frame")
>>>      , closed = structure(c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
>>> TRUE, TRUE,
>>> TRUE,
>>> TRUE), .Dim = c(5L, 2L))
>>>      , type = "Z"
>>> )
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Nicolas Servant
Plateforme de Bioinformatique
Unité 900 : Institut Curie - Inserm - Mines ParisTech
26, rue d'Ulm - 75248 Paris Cedex 05 - FRANCE

Email: Nicolas.Servant at curie.fr
Tel: 01 56 24 69 85
http://bioinfo.curie.fr/



More information about the Bioconductor mailing list