[BioC] IRanges: RangedData.values() deletes rownames

Michael Dondrup Michael.Dondrup at uni.no
Thu Nov 4 15:58:19 CET 2010


Hi,

I have done a little more testing on the values()<- function and I can only warn from using it, at least when
there are multiple spaces in the RangedData object. You will have no means of getting the names back in the right order!

Even if it wouldn't delete the ranges names, it will shuffle the data, and I don't see a use-case where this function
could be used in a sensible way. The problem is the order in which the the data is kept, e.g. i read in some data into a data.frame,
then made a RangedData from it. Then assigning more value columns found in the data.frame.  But the data in the Ranged data object 
are no longer in the original order but in the order of the RangedData sorted by spaces.

So how are you supposed to get the order right in the DataFrame before using  values()<-?
- it's not possible....
 Therefore I opt for that either the function is removed, or
that a matching on the row.names of DataFrame is made, and if there is a difference or no row.names an error is thrown, or that the 
data is added in the order of the ranges of the rangedData (that should be preserving the original order).
If this matching is not made, I see no way of guessing the right order for the DataFrame.

Some example code to illustrate what I mean:


> rd = RangedData(ranges= IRanges(start=1:10, width=1, names=letters[1:10]), space=sample(1:2, 10, re=T))
> rn = rownames(rd) # save the names in the right order
> values(rd) = DataFrame(data=letters[1:10], row.names=letters[1:10])
> rd # it's broken, but you dont see it
>row.names(rd) = rn #  now, everything is broken, but at least you can see it:
> rd





On Nov 3, 2010, at 9:51 PM, Michael Lawrence wrote:

> 
> 
> On Wed, Nov 3, 2010 at 9:06 AM, Michael Dondrup <Michael.Dondrup at uni.no> wrote:
> Hi,
> 
> I remember having posted something like this earlier.
> 
> calling values on a RangedData object deletes the ranges names, if the DataFrame doesn't have names, is
> that intentional?
> 
> > rd = RangedData(ranges=IRanges(start=1:2, width=1, names=c("A","B")), space=1)
> > rownames(rd)
> [1] "A" "B"
> > values(rd) = DataFrame(somedata=1:2)
> > rownames(rd)
> NULL
> 
> 
> Definitely a bug, because it yields an invalid object, where the ranges have names but not the data. The question is how to rectify the names. You're expecting that if the DataFrame has NULL names, for it to take the names of the RangedData. That makes sense to me. If the rownames on the DataFrame were not NULL and different from the RangedData, what should happen? I'm thinking that should throw an error; that's how ranges<- has always behaved. 
> 
> Anyway, I checked in a fix to the devel version. Thanks for reporting this.
> 
> Michael
>  
> I can work around this by setting them again:
> 
> values(rd) = DataFrame(somedata=1:2, row.names=c("A","B"))
>  but that's still a glitch...
> 
> Michael
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] rtracklayer_1.10.0 RCurl_1.4-3        bitops_1.0-4.1     IRanges_1.8.0
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 



More information about the Bioconductor mailing list