[BioC] loop over IRanges spaces

Yvan yvan.strahm at uni.no
Tue May 11 11:15:38 CEST 2010


Hello,

I tried your code below, but and iI am not able to create the 
scoreRleList as weight could be only a integer list.

 > read_gff
function(file="gff3.txt") {
         gff <- read.delim(file,header=FALSE)
         colnames(gff) <- c("seqname", "source", "feature", "start", 
"end", "score", "strand","frame", "comments")
         return(gff)
}
 > library(IRanges)

Attaching package: 'IRanges'

The following object(s) are masked from 'package:base':

     cbind, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int,
     rbind, rep.int, table

 > 
gff<-read_gff("/home/yvan/Lundalm/R/projects/coen/gff_original/50Lines.gff")
 > 
rd<-RangedData(IRanges(gff$start,width=1),score=gff$score,start=gff$start,space=gff$seqname)
 > scoreRleList<-coverage(rd, weight ="score",width = 
as.list(table(rd$space)))
Error in .local(x, shift, width, weight, ...) :
   'weight' must be a non-empty list of integers
 >
Can I change/force the type of 'weight' ?


In a different approach, I try to use different views on the data set. 
This was motivated by the fact that the difference on the start of each 
probes could indicate either a change of scaffold or a gap inside a 
scaffold.

 > first_view<-slice(diff(rd$start), lower=10, upper=200)
 > first_view
Views on a 60-integer XInteger subject
subject: 28 24 25 37 23 24 32 34 25 23 63 ... 24 35 24 29 31 28 19 29 27 
29 37
views:
     start end width
[1]     1  19    19 [28 24 25 37 23 24 32 34 25 23 63 21 36 27 19 33 33 
24 32]
[2]    21  39    19 [ 31  22  34  18  36  25 ...  31 108  34  28  26  
23  33]
[3]    41  60    20 [32 28 26 20 33 25 30 30 24 ... 29 31 28 19 29 27 29 
37 22]
 >

so here the three views are the three scaffold. but if I want to change 
the subject of the view to the score
 > 
second_view<-Views(subject=rd$score,start=first_view$start,end=first_view$end)
Error in function (classes, fdef, mtable)  :
   unable to find an inherited method for function "Views", for 
signature "numeric"
 >

Again can I change the expected type of the signature in the Views function.

Thanks again for your help.
Cheers,
yvan

On 04/26/2010 09:08 PM, Patrick Aboyoun wrote:
> Yvan,
> It appears to me that you are trying to perform two conflicting 
> activities:
>
> 1) Calculate the running sum of a metric over an annotated sequence 
> (as evidenced by your aggregate function call)
> 2) Find the sum for a metric across specified intervals on the 
> annotated sequence (as evidenced by your desire to assign the 
> aggregated sums into an existing RangedData object)
>
> Taking a step back, I am guessing that you are trying to transform 
> something akin to a UCSC bed file into something else that is UCSC bed 
> file like. If you are using rtracklayer, this means your initial data 
> are stored in a RangedData object. To create a RangedData object 
> containing the running sum of a values column from an initial 
> RangedData object, I recommend:
>
> 1) Creating an RleList object from the RangedData object using the 
> coverage function. Make sure to specify the metric of interest in the 
> weight argument to coverage.
> 2) Using the runsum function on the RleList object to calculate your 
> running sums.
> 3) Creating a RangedData object from the RleList object in step 2 
> using as(<<obj>>, "RangedData")
>
> Here is an example:
>
> > # Step 1: create an RleList representation of the metric
> > rd <- RangedData(IRanges(start = c(5, 10, 15, 2, 4, 8), end = c(7, 
> 14, 21, 3, 6, 9)),
>                    score = 1:6, space = rep(c("A", "B"), each = 3))
> > scoreRleList <- coverage(rd, weight = "score", width = list(A = 30, 
> B = 10))
> > scoreRleList
> SimpleRleList of length 2
> $A
> 'integer' Rle of length 30 with 6 runs
>   Lengths: 4 3 2 5 7 9
>   Values : 0 1 0 2 3 0
>
> $B
> 'integer' Rle of length 10 with 6 runs
>   Lengths: 1 2 3 1 2 1
>   Values : 0 4 5 0 6 0
>
> > # Step 2: calculate the running sums
> > scoreRunsum <- runsum(scoreRleList, k = 3, endrule = "constant")
> > scoreRunsum
> SimpleRleList of length 2
> $A
> 'integer' Rle of length 30 with 15 runs
>   Lengths: 3 1 1 1 1 1 1 1 3 1 1 5 1 1 8
>   Values : 0 1 2 3 2 1 2 4 6 7 8 9 6 3 0
>
> $B
> 'integer' Rle of length 10 with 7 runs
>   Lengths:  2  1  1  1  1  1  3
>   Values :  8 13 14 15 10 11 12
>
>
> > # Step 3: Create a RangedData representation of the running sums
> > rdRunsum <- as(scoreRunsum, "RangedData")
> > rdRunsum
> RangedData with 22 rows and 1 value column across 2 spaces
>           space    ranges   |     score
> <character> <IRanges>   | <integer>
> 1             A  [ 1,  3]   |         0
> 2             A  [ 4,  4]   |         1
> 3             A  [ 5,  5]   |         2
> 4             A  [ 6,  6]   |         3
> 5             A  [ 7,  7]   |         2
> 6             A  [ 8,  8]   |         1
> 7             A  [ 9,  9]   |         2
> 8             A  [10, 10]   |         4
> 9             A  [11, 13]   |         6
> ...         ...       ... ...       ...
> 14            A  [22, 22]   |         3
> 15            A  [23, 30]   |         0
> 16            B  [ 1,  2]   |         8
> 17            B  [ 3,  3]   |        13
> 18            B  [ 4,  4]   |        14
> 19            B  [ 5,  5]   |        15
> 20            B  [ 6,  6]   |        10
> 21            B  [ 7,  7]   |        11
> 22            B  [ 8, 10]   |        12
>
> > sessionInfo()
> R version 2.11.0 Patched (2010-04-24 r51820)
> i386-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] IRanges_1.6.1
>
> loaded via a namespace (and not attached):
> [1] tools_2.11.0
>
>
>
> On 4/23/10 6:26 AM, Michael Lawrence wrote:
>> Also note that it's not really necessary to loop here, as is often 
>> the case
>> with IRanges:
>>
>> rd$windo<- unlist(runmean(RleList(values(rd)[,"score"])))
>>
>> On Fri, Apr 23, 2010 at 6:13 AM, Michael Lawrence<michafla at gene.com>  
>> wrote:
>>
>>>
>>> On Fri, Apr 23, 2010 at 1:49 AM, Yvan<yvan.strahm at uni.no>  wrote:
>>>
>>>>   Hello,
>>>> Thank you both of you.
>>>>
>>>> I could could calculate the sliding window, but not as a Rle 
>>>> object, could
>>>> not append values for the last w-1 position in the Rle object in 
>>>> order to
>>>> take care of the size problem.
>>>>
>>>>
>>> Why not? Rle supports all the normal vector operations. And runsum or
>>> runmean will output a vector of the same size as the input, using a 
>>> choice
>>> of two endrules. If you want 0's at the end, try something like:
>>>
>>> rle[(nrow(rd)-w+1):nrow(rd)]<- 0
>>>
>>>
>>>
>>>> So I did it like that:
>>>>
>>>> params<-RDApplyParams(rd,function(rd)
>>>> append((diff(c(0,cumsum(rd$score)),lag=w)/w),rep(0,each=w-1),after=(length(rd$score)-w+1))) 
>>>>
>>>>
>>>> But when I try to add the new values to the rangedData object I got 
>>>> these
>>>> error.
>>>>
>>>>> values(rd)[,"windo"]<-rdapply(params)
>>>> Error in `[<-`(`*tmp*`, , j, value =<S4 object of class 
>>>> "DataFrame">) :
>>>>    ncol(x[j]) != ncol(value)
>>>> In addition: Warning messages:
>>>> 1: In mapply(f, ..., SIMPLIFY = FALSE) :
>>>>    longer argument not a multiple of length of shorter
>>>> 2: In mapply(f, ..., SIMPLIFY = FALSE) :
>>>>    longer argument not a multiple of length of shorter
>>>>
>>>> But when I check the size, they are the same, here for one space
>>>>
>>>>> x<-rdapply(params)
>>>>> length(x$SCAFFOLD_100) == length(rd["SCAFFOLD_100"]$windo)
>>>> [1] TRUE
>>>>
>>>>
>>> It may be that that type of insertion is unsupported. Why not just do
>>> something like:
>>>
>>> rd$window<- unlist(x)
>>>
>>>
>>>>   Maybe params miss a parameter or the way I try to update the rd 
>>>> object is
>>>> wrong. Anyway form the rdapply output a vector could be created and 
>>>> so a new
>>>> rd object with the new value column.
>>>>
>>>> yvan
>>>>
>>>>
>>>> On 22/04/10 15:51, Michael Lawrence wrote:
>>>>
>>>>
>>>>
>>>> On Thu, Apr 22, 2010 at 5:49 AM, Michael 
>>>> Dondrup<Michael.Dondrup at uni.no>wrote:
>>>>
>>>>> Hi,
>>>>> how about funtion rdapply (not lapply) which is for that?
>>>>>
>>>>>
>>>> lapply() should apply per-space as well, basically providing a 
>>>> short-cut
>>>> for the more complicated rdapply().
>>>>
>>>>> lapply(rd, function(x) sum(x$score))
>>>> $chr1
>>>> [1] 3
>>>>
>>>> $chr2
>>>> [1] 0
>>>>
>>>> sapply() also works:
>>>>> sapply(rd, function(x) sum(x$score))
>>>> chr1 chr2
>>>>     3    0
>>>>
>>>> Another choice is tapply:
>>>>> tapply(rd$score, space(rd), sum)
>>>> chr1 chr2
>>>>     3    0
>>>>
>>>> Michael
>>>>
>>>>   The code below computes the sum score for each space in the 
>>>> RangedData:
>>>>> # taken from the examples mostly:
>>>>>> ranges<- IRanges(c(1,2,3),c(4,5,6))
>>>>>>    score<- c(2L, 0L, 1L)
>>>>>>    rd<- RangedData(ranges, score, space = c("chr1","chr2","chr1"))
>>>>>> rd
>>>>> RangedData with 3 rows and 1 value column across 2 spaces
>>>>>         space    ranges |     score
>>>>> <character> <IRanges>  |<integer>
>>>>> 1        chr1    [1, 4] |         2
>>>>> 2        chr1    [3, 6] |         1
>>>>> 3        chr2    [2, 5] |         0
>>>>>> params<- RDApplyParams(rd, function(rd) sum(score(rd)))
>>>>>> rdapply(params)
>>>>> $chr1
>>>>> [1] 3
>>>>>
>>>>> $chr2
>>>>> [1] 0
>>>>>
>>>>>
>>>>> Cheers
>>>>> Michael
>>>>>
>>>>> Am Apr 22, 2010 um 1:57 PM schrieb Yvan:
>>>>>
>>>>>> On 21/04/10 18:43, Michael Lawrence wrote:
>>>>>>>
>>>>>>> On Wed, Apr 21, 2010 at 6:07 AM, Yvan<yvan.strahm at uni.no
>>>>>>> <mailto:yvan.strahm at uni.no>>  wrote:
>>>>>>>
>>>>>>>     Hello List,
>>>>>>>
>>>>>>>     I am confused about  how to loop over a rangedData object.
>>>>>>>     I have this rangedData
>>>>>>>
>>>>>>>     RangedData with 61 rows and 1 value column across 3 spaces
>>>>>>>             space     ranges |     score
>>>>>>> <character> <IRanges>  |<numeric>
>>>>>>>     1   SCAFFOLD_1 [  8,   8] |  -0.09405
>>>>>>>
>>>>>>>     and the spaces are
>>>>>>>
>>>>>>>     "SCAFFOLD_1"   "SCAFFOLD_10"  "SCAFFOLD_100"
>>>>>>>
>>>>>>>     using aggregate it is possible to apply a function to one of 
>>>>>>> the
>>>>> space
>>>>>>>     aggregate(rd["SCAFFOLD_1"]$score, start =
>>>>>>>     1:(length(rd["SCAFFOLD_1"]$score)-w+1), width = w, FUN = sum)
>>>>>>>
>>>>>>>     but how can I apply the aggregate to all space without a for 
>>>>>>> loop ?
>>>>>>>
>>>>>>>
>>>>>>> It looks like you're attempting a running window sum of the score
>>>>>>> vector. There are more efficient ways of doing this besides
>>>>>>> aggregate(). If you convert the score into an Rle, you can use
>>>>> runsum().
>>>>>>> Anyway, to do this over each space individually, use lapply().
>>>>>>>
>>>>>>> This would come out to something like:
>>>>>>>
>>>>>>> values(rd)[,"smoothScore"]<- lapply(rd, function(x)
>>>>>>> runsum(Rle(x$score), w))
>>>>>>>
>>>>>>> Probably not exactly right, but it gets you in the right 
>>>>>>> direction...
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>>
>>>>>> Hello Michael,
>>>>>>
>>>>>> Thanks for the answer and the tip about runsum!
>>>>>> I try with lapply but could not get it working right, the main 
>>>>>> problem
>>>>>> is that the runsum is calculated on all values and not for a each
>>>>>> specific spaces.
>>>>>> Sorry, I should have been more precise in the problem description.
>>>>>> The runsum should be calculated in a space specific manner, let 
>>>>>> say w=2
>>>>>>
>>>>>> space        score    cumsum
>>>>>> 1 space1    1            3
>>>>>> 2 space1    2            4
>>>>>> 3 space1    2            NA
>>>>>> 4 space2    10          21
>>>>>> 5 space2    11          22
>>>>>> 6 space2    11          NA
>>>>>>
>>>>>> Is it possible to do it with lapply?
>>>>>> Thanks again for your help
>>>>>> cheers,
>>>>>> yvan
>>>>>>
>>>>>>        [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at stat.math.ethz.ch
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>>
>>>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list