[BioC] ggbio: Data stored twice in 'GGbio' object
julian.gehring at embl.de
Tue Aug 6 22:19:24 CEST 2013
I agree that the main problem is that the data is practically stored
twice, irrespective whether this is done in the form of two identical or
similar object. Especially having the large amounts of genomic data in
mind, this way of handling data may not scale well.
On 08/06/2013 08:29 PM, Michael Lawrence wrote:
> This is a flaw in the design of ggbio. It was a solution to the problem of
> ggplot2 requiring a data.frame in the plot object, while ggbio would like
> to keep the original data structure (like a GRanges) around. Probably the
> correct solution is for ggbio to extend the ggplot object, or otherwise
> represent the plot, and to perform the necessary reduction of the data when
> the plot is rendered. This is how the ggsubplot package works, although it
> is not changing the underlying data structure.
> But the data is only stored *exactly* twice if the input data is a
> data.frame. It's not very efficient to store the data twice, but my main
> concern is the redundancy in the data model.
> On Tue, Aug 6, 2013 at 2:33 AM, Julian Gehring <julian.gehring at embl.de>wrote:
>> The 'ggbio::ggplot' (ggbio_1.9.7, R_2013-08-05 r63513) function seems to
>> store its data twice.
>> df = data.frame(x = 1:10, y = rnorm(10))
>> p = ggbio::ggplot(data = df)
>> identical(p at data, p at ggplot$data) ## TRUE
>> shows that the data 'df' is stored in p at data as well as p at ggplot$data.
>> Especially for large data sets, this is inefficient. Is there a good
>> reason for this?
>> Best wishes
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> Search the archives: http://news.gmane.org/gmane.**
More information about the Bioconductor