[BioC] IRanges: list columns in RangedData objects (was Re: IRanges: cbind not well defined for RangedData?)

Patrick Aboyoun paboyoun at fhcrc.org
Sat Mar 20 03:28:41 CET 2010


I've done some testing for as.data.frame on a RangedData object and 
found that the existing coercion methodology was producing incorrect 
results in certain circumstances when there was a list, SimpleList or 
CompressList data column due to vector recycling. For now, as.data.frame 
for a RangedData object will throw an error if it contains a list, 
SimpleList, or CompressedList data column. If there is demand for 
as.data.frame supporting list columns, we can take another look at this 
issue.


Thanks,
Patrick


On 3/19/10 5:14 PM, Patrick Aboyoun wrote:
> Michael L.,
> Given that we have IntegerList objects to store lists of integers, I am
> not inclined to build logic for printing a list column in a DataTable.
> To change the current behavior, the relevant method to work on is
> showAsCell,list-method.
>
> The conversion of a DataTable to a data.frame when the DataTable
> contains some non atomic columns is a bit dicey. I'm not sure that a
> data.frame truly supports list columns or it was something grandfathered
> since data.frame inherits from list. For example the data.frame
> constructor converts list inputs to multiple columns:
>
>   >  data.frame(x = 1:4, y = as.list(2:5))
>     x y.2L y.3L y.4L y.5L
> 1 1    2    3    4    5
> 2 2    2    3    4    5
> 3 3    2    3    4    5
> 4 4    2    3    4    5
>
> We can circumvent this behavior by decorating a list object with the
> necessary data.frame attributes, but I'm not sure how many methods will
> be able to handle a data.frame with a list column properly.
>
>
> Patrick
>
>
> On 3/19/10 3:46 PM, Michael Lawrence wrote:
>    
>>
>> On Fri, Mar 19, 2010 at 12:59 PM, Patrick Aboyoun<paboyoun at fhcrc.org
>> <mailto:paboyoun at fhcrc.org>>  wrote:
>>
>>      Michael,
>>      Thanks for the report. RangedData objects have been designed to
>>      hold list objects in the values columns. You did, however, find a
>>      bug the printing of a RangedData object when it contains a list
>>      column. I fixed the show method in both BioC 2.5 IRanges (>=
>>      1.4.16) and BioC 2.6 IRanges (>= 1.5.66) to handle this case.
>>
>>      >  rd<- RangedData(IRanges(start=1:4, width=10,
>>      names=paste("a",1:4)), space=1:2 )
>>      >  rd$a.value<- rnorm(4)
>>      >  rd$a.list<- as.list(1:4)
>>      >  rd
>>      RangedData with 4 rows and 2 value columns across 2 spaces
>>               space    ranges |   a.value   a.list
>>      <character>  <IRanges>  |<numeric>  <list>
>>      a 1           1   [1, 10] | 0.5362468 ########
>>      a 3           1   [3, 12] | 0.5459593 ########
>>      a 2           2   [2, 11] | 0.4705777 ########
>>      a 4           2   [4, 13] | 0.4160833 ########
>>
>>
>> Thanks for doing this Patrick, but what's the deal with the #'s? I
>> mean, how about "1, 2, 3, 4" instead? That's how data.frame prints it.
>>
>>      As you noticed, a list column in a RangedData object will result
>>      in column expansion if you convert it to a data.frame, which can
>>      lead to large data object is the number of rows in a RangedData
>>      object is large.
>>
>>
>> Does this make sense? data.frame can handle list columns.
>>
>> data(mtcars)
>> mtcars$a.list<- list(1:4)
>>
>>      Since the show method prints out the classes of each of the
>>      columns, the user will be able to check to ensure their data
>>      columns are stored correctly prior to any conversion to a data.frame.
>>
>>      >  as.data.frame(rd)
>>       space start end width names   a.value a.list.1L a.list.2L
>>      a.list.3L a.list.4L
>>      1     1     1  10    10   a 1 0.5362468         1         2
>>        3         4
>>      2     1     3  12    10   a 3 0.5459593         1         2
>>        3         4
>>      3     2     2  11    10   a 2 0.4705777         1         2
>>        3         4
>>      4     2     4  13    10   a 4 0.4160833         1         2
>>        3         4
>>
>>
>>
>>      Patrick
>>
>>
>>      On 3/19/10 7:23 AM, Michael Dondrup wrote:
>>
>>          Dear Patrick and Michael,
>>
>>          thank you very much for your helpful support on my last two
>>          connected issued! It is somehow in
>>          the documentation in the examples but I must have overlooked it.
>>
>>          I tried it out immediately, and it works fine:
>>
>>
>>              rd = RangedData(IRanges(start=1:4, width=10,
>>              names=paste("a",1:4)), space=1:2 )
>>              rd
>>              rd$a.value = rnorm(4)
>>              rd
>>
>>          RangedData with 4 rows and 1 value column across 2 spaces
>>                  space    ranges |    a.value
>>          <character>  <IRanges>   |<numeric>
>>          1           1   [1, 10] | -0.6765515
>>          2           1   [3, 12] |  1.5406962
>>          3           2   [2, 11] | -1.2599696
>>          4           2   [4, 13] |  0.4971178
>>
>>          But then I had to reboot my computer because by accident tried
>>          this on a 100,000 ranges
>>          and the value was actually a list, not a vector, and then the
>>          re-cycling rule struck me:
>>
>>
>>              rd$a.list = as.list(1:4)
>>
>>          first everything seems fine and normal but if you try to print it:
>>
>>              rd
>>
>>          RangedData with 4 rows and 1 value column across 2 spaces
>>          Error in .Method(..., deparse.level = deparse.level) :
>>            number of rows of matrices must match (see arg 2)
>>          or try to convert into a data.frame:
>>
>>              as.data.frame(rd)
>>
>>            space start end width names a.list.1L a.list.2L a.list.3L
>>          a.list.4L
>>          1     1     1  10    10   a 1         1         2         3
>>                4
>>          2     1     3  12    10   a 3         1         2         3
>>                4
>>          3     2     2  11    10   a 2         1         2         3
>>                4
>>          4     2     4  13    10   a 4         1         2         3
>>                4
>>
>>          as I tried this, I R ran into some memory problems.
>>
>>          This just as a warning,  to make sure you really use a vector
>>          here. Maybe something to put in the
>>          type checking, or documentation?
>>
>>          Anyway, thanks a lot again
>>          Michael
>>
>>
>>
>>      _______________________________________________
>>      Bioconductor mailing list
>>      Bioconductor at stat.math.ethz.ch<mailto:Bioconductor at stat.math.ethz.ch>
>>      https://stat.ethz.ch/mailman/listinfo/bioconductor
>>      Search the archives:
>>      http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>      
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list