[R] [fixed] vectorized nested loop: apply a function that takes two rows
Peter Dalgaard
P.Dalgaard at biostat.ku.dk
Wed Jan 24 15:35:15 CET 2007
Jose Quesada wrote:
> Thanks Charles, Martin,
>
> Substantial improvement with the vectorized solution. Here is a quick benchmark:
>
> # The loop-based solution:
> nestedCos = function (x) {
> if (is(x, "Matrix") ) {
> cos = array(NA, c(ncol(x), ncol(x)))
> for (i in 2:ncol(x)) {
> for (j in 1:(i - 1)) {
> cos[i, j] = cosine(x[, i], x[, j])
> }
> }
> }
> return(cos)
> }
> # Charles C. Berry's vectorized approach
> flatCos = function (x) {
> res = crossprod( x , x )
> diagnl = Diagonal( ncol(x), 1 / sqrt( diag( res )))
> cos = diagnl %*% res %*% diagnl
> return(cos)
> }
>
> Benchmarking:
>
>
>> system.time(for(i in 1:10)nestedCos(x))
>>
> (I stopped because it was taking too long)
> Timing stopped at: 139.37 3.82 188.76 NA NA
>
>> system.time(for(i in 1:10)flatCos(x))
>>
> [1] 0.43 0.00 0.48 NA NA
>
> #------------------------------------------------------
> As much as I like to have faster code, I'm still wondering WHY flatCos gets the same results; i.e., why multiplying the inverse sqrt root of the diagonal of x BY x, then BY the diagonal again produces the expected result. I checked the wikipedia page for crossprod and other sources, but it still eludes me. I can see that scaling by the sqrt of the diagonal once makes sense with 'res <- crossprod( x , x ) gives your result up to scale factors of sqrt(res[i,i]*res[j,j])', but I still don't see why you need to postmultiply by the diagonal again.
>
>
Didn't follow this thread too closely, but the point would seem to be
that the inner product of two normalized vectors is the cosine of the
angle. So basically, you want
crossprod(X%*%diagnl, X%*%diagnl) == t(diagnl) %*% t(X) %*% X %*% diagnl
I think, BTW, that another version not requiring Matrix is
Cr <- crossprod(X)
D <- sqrt(diag(C))
Cr/outer(D, D)
> Maybe trying to attack a simpler problem might help my understanding: e.g., calculating the cos of a column to all other colums of x (that is, the inner part of the nested loop). How would that work in a vectorized way? I'm trying to get some general technique that I can reuse later from this excellent answer.
>
> Thanks,
> -Jose
>
>
>> I am rusty on 'Matrix', but I see there are crossprod methods for those
>> classes.
>>
>> res <- crossprod( x , x )
>>
>> gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so
>> something like
>>
>> diagnl <- Diagonal( ncol(x), sqrt( diag( res ) )
>>
>>
>
> OOPS! Better make that
>
> diagnl <- Diagonal( ncol(x), 1 / sqrt( diag( res ) )
>
>
>> final.res <- diagnl %*% res %*% diagnl
>>
>> should do it.
>>
>>
>
>
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list