[BioC] IRanges: cbind not well defined for RangedData?

Patrick Aboyoun paboyoun at fhcrc.org
Thu Mar 18 18:55:15 CET 2010


I have been experimenting with S4 dispatch on ... (optional arguments) 
and reading the man page for dotMethods

 > help(dotsMethods)

Long story short, adding support for cbind-ing a vector to an S4 object 
would probably involve either

1) creating an S4 class union of an S4 class (e.g. RangedData) with 
vector so the existing S4 dispatch would choose the correct method or
2) creating an S4 default method for cbind that has it own dispatch 
mechanism for choosing a cbind method.

I don't find either of these options appealing and second Michael 
Lawrence's suggestion of using "$<-" or "[[<-" to bind new columns to a 
RangedData object.

 > a.value <- rnorm(4)
 > rd1 <- RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8), 
width=runif(4, min=1, max=10E5), names=paste("bla",1:4)), space=1:2)
 > obj <- cbind(rd1, a.value)
 > showMethods("cbind")
Function: cbind (package IRanges)
...="ANY"
...="DataFrame"
...="DataFrameList"
...="DataTable"
...="numeric#RangedData"
     (inherited from: ...="ANY")

 > df1 <- unlist(values(rd1))
 > class(df1)
[1] "DataFrame"
attr(,"package")
[1] "IRanges"
 > cbind(df1, a.value)
      df1 a.value
[1,] ?   -0.6268173
[2,] ?   2.540871
[3,] ?   0.4137926
[4,] ?   -0.897856
 > showMethods("cbind")
Function: cbind (package IRanges)
...="ANY"
...="DataFrame"
...="DataFrame#numeric"
     (inherited from: ...="ANY")
...="DataFrameList"
...="DataTable"
...="numeric#RangedData"
     (inherited from: ...="ANY")

 > sessionInfo()
R version 2.11.0 Under development (unstable) (2010-03-14 r51276)
i386-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] IRanges_1.5.64


On 3/18/10 10:32 AM, Michael Lawrence wrote:
> On Thu, Mar 18, 2010 at 7:55 AM, Michael Dondrup<Michael.Dondrup at uni.no>wrote:
>
>    
>> Hi,
>> here is another little possible glitch with RangedData and cbind(),
>> actually would like to propose to
>> change or expand the behavior of the cbind function or to add to it's
>> documentation. The use-case is as
>> follows:
>> Assume we have some chromosomal Ranges in a RangedData object. Then we can
>> iteratively compute statistics  on
>> these ranges and attach them to the DataFrame holding extra data, e.g. some
>> count data or combine qualitiy scores possibly from multiple conditions.
>>
>> So according to the documentation of the RangedData-class,
>>      
>>> The first mode treats the object as a contiguous "data frame" annotated
>>>        
>> with range information.
>>      
>>> The accessors start, end, and width get the corresponding fields in the
>>>        
>> ranges as atomic integer vectors, undoing
>>      
>>> the division over the spaces. The [[>  and matrix-style [, extraction and
>>>        
>> subsetting functions unroll the data in the same way. [[<- does the inverse.
>> I assume I could use cbind(rd, a.value) to attach the statistics to the
>> internal data representation. So would it be possible to
>> make cbind return something more useful, or are there better ways to do it?
>>
>>
>>
>>      
> Right now it's just using the cbind method for "ANY", because one does not
> exist for RangedData. To be honest, I've always just used the $<- syntax for
> adding the statistics. This seems like it would work well in your use case,
> as well.
>
> Like:
>
> rd$a.value<- a.value
>
> Michael
>
>
>
>    
>> Best
>> Michael
>>
>>
>> Example:
>>
>>      
>>> a.value = rnorm(4)
>>> rd1 = RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8),
>>>        
>> width=runif(4, min=1, max=10E5), names=paste("bla",1:4)), space=1:2)
>>      
>>> rd1
>>>        
>> RangedData with 4 rows and 0 value columns across 2 spaces
>>             space                 ranges |
>>       <character>               <IRanges>  |
>> bla 1           1 [773679042, 774010137] |
>> bla 3           1 [194819013, 195136171] |
>> bla 2           2 [183105318, 183509803] |
>> bla 4           2 [107730452, 107823748] |
>>
>>      
>>>   obj = cbind(rd1, a.value)
>>>        
>> And I would intuitively assume the result to look exactly like this:
>>
>>      
>>> RangedData(ranges=IRanges(start=runif(4, min=1, max=10E8), width=runif(4,
>>>        
>> min=1, max=10E5), names=paste("bla",1:4)), space=1:2, a.value)
>> RangedData with 4 rows and 1 value column across 2 spaces
>>             space                 ranges |    a.value
>>       <character>               <IRanges>  |<numeric>
>> bla 1           1 [473042533, 473820859] | -1.7956588
>> bla 3           1 [ 75991383,  76022516] |  0.3588571
>> bla 2           2 [475385363, 476224756] |  1.4166218
>> bla 4           2 [532603052, 532902678] |  0.2324424
>>
>> But what I get is much different:
>>
>>      
>>> class(obj)
>>>        
>> [1] "matrix"
>>      
>>> typeof(obj)
>>>        
>> [1] "list"
>>
>>      
>>> obj
>>>        
>>      rd1 a.value
>> [1,] ?   0.3255676
>> [2,] ?   0.5913471
>> [3,] ?   0.9317755
>> [4,] ?   -0.8897527
>>
>>      
>>> sessionInfo()
>>>        
>> R version 2.10.1 (2009-12-14)
>> x86_64-apple-darwin9.8.0
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] IRanges_1.4.9
>>
>> loaded via a namespace (and not attached):
>> [1] tools_2.10.1
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>      
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list