[BioC] undocumented behavior in tilingArray::costMatrix [Scanned]

Du, Yang duyang at fbn-dummerstorf.de
Thu Jul 26 12:25:38 CEST 2012


Dear Wolfgang,

I am looking at one particular function costMatrix in tilingArray, it
seems that the returning matrix is not consistent with the
documentation.

The calculation is fine for rows G[1:(maxk-1),]; but for the last row
G[maxk,] only the first column has been assigned a value outside the
loop, thus I am not sure if this is intended (as I feel this
would eliminate certain cases in the segmentation result).

Below is a small example, far from real but just a proof of concept,

1> x<-c(1,1,1,3,3,3)
## here the behavior is expected since only column 1 can be filled, from
column 2 has (2+6-1)>6 thus kept NA
> costMatrix(x, maxk=6)
     [,1]     [,2]     [,3] [,4] [,5] [,6]
[1,]  0.0 0.000000 0.000000    0    0    0
[2,]  0.0 0.000000 2.000000    0    0   NA
[3,]  0.0 2.666667 2.666667    0   NA   NA
[4,]  3.0 4.000000 3.000000   NA   NA   NA
[5,]  4.8 4.800000       NA   NA   NA   NA
[6,]  6.0       NA       NA   NA   NA   NA

# here I was expecting the first 2 rows in the previous call,
1> costMatrix(x, maxk=2)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0    0    0    0    0    0
[2,]    0   NA   NA   NA   NA   NA

So the returning matrix will be ok when maxk is equal to the length/rows
of the input vector/matrix, but only having the first column filled in
the last row when maxk is less. 
Subsequently this will render inaccurate estimates when using the wrong matrix, 
since in the c code all elements with wrong NA will not be add to the cost function.


# only looking for one change, 2 segments, biased towards n-1
> .Call("findsegments", costMatrix(x,maxk=3) ,as.integer(2), as.integer(1), PACKAGE="tilingArray")$th
     [,1] [,2]
[1,]    7    0
[2,]    6    7
> segment(x,maxk=3,2)@breakpoints[[2]]
     estimate
[1,]        6

# here give a larger value for maxk then subset the first maxk-1 rows, which are accurate.
> .Call("findsegments", costMatrix(x,maxk=4)[1:3,] ,as.integer(2), as.integer(1), PACKAGE="tilingArray")$th
     [,1] [,2]
[1,]    7    0
[2,]    4    7


-- 
Regards
Yang Du



More information about the Bioconductor mailing list