[R] Help with apply

Mon Oct 4 21:03:32 CEST 2010

Hi,

It seemed to me like the calculations were implemented in a bit of a
redundant way, so I tried to simplify it algebraically.  doit1 is
Phil's version of your original calculations, and doit2 is my
simplified way.

doit1 <- function(i, j) {
  exp(sum(log((exp(tmp[i,] * (qq$nodes[j]-b)))/
              (factorial(tmp[i,])*exp(exp(qq$nodes[j]-b))))))*
  ((1/(s*sqrt(2*pi)))*exp(-((qq$nodes[j]-0)^2/(2*s^2))))/
  dnorm(qq$nodes[j])*qq$weights[j]
}

doit2 <- function(i, j) {
  (prod(exp((tmp[i,]*(qq$nodes[j]-b))-exp(qq$nodes[j]-b))/
        factorial(tmp[i,]))*qq$weights[j])/
  (s*sqrt(2*pi)*dnorm(qq$nodes[j])*exp((qq$nodes[j]-0)^2/(2*s^2)))
}

system.time(t(outer(1:300,1:2,Vectorize(doit1))))
system.time(t(outer(1:300,1:2,Vectorize(doit2))))

New vs. Old
Characters (for the calculations):
154 vs. 177
Function calls for calculations (i.e., excluding subseting/dnorm, etc.):
22 vs. 27

Timing difference:
Not appreciable on my system, but I spent so long wading through
exactly what was happening, I figured why not share it.  Moving tmp up
to 300 rows, I got:

> system.time(t(outer(1:300,1:2,Vectorize(doit1))))
   user  system elapsed
  9.044   0.008  10.135
> system.time(t(outer(1:300,1:2,Vectorize(doit2))))
   user  system elapsed
  8.413   0.008   8.893

At this rate, the difference between a few million rows might save you
enough time you can stretch your hands once, and those 20 characters
should really free up space on your hard disk....sigh.

Josh

On Mon, Oct 4, 2010 at 11:22 AM, Doran, Harold <HDoran at air.org> wrote:
> Ahhhh, I see that you wrapped apply around mapply. I was toying with both; but didn't think to use mapply inside apply. As always, thank you, Phil
>
> -----Original Message-----
> From: Phil Spector [mailto:spector at stat.berkeley.edu]
> Sent: Monday, October 04, 2010 2:20 PM
> To: Doran, Harold
> Cc: Gabriela Cendoya; R-help
> Subject: Re: [R] Help with apply
>
> Harold -
>    The first way that comes to mind is
>
> doit = function(i,j)exp(sum(log((exp(tmp[i,]*(qq$nodes[j]-b))) /
>        (factorial(tmp[i,]) * exp(exp(qq$nodes[j]-b)))))) *
>        ((1/(s*sqrt(2*pi)))  * exp(-((qq$nodes[j]-0)^2/(2*s^2))))/
>        dnorm(qq$nodes[j]) * qq$weights[j]
> t(outer(1:3,1:2,Vectorize(doit)))
>
> but you said you wanted to use apply.  That leads to
>
> doit1 = function(tmpi,nod,wt)
>       exp(sum(log((exp(tmpi*(nod-b))) / (factorial(tmpi) *
>       exp(exp(nod-b)))))) * ((1/(s*sqrt(2*pi)))  *
>       exp(-((nod-0)^2/(2*s^2))))/dnorm(nod) * qq$weights[j]
> apply(tmp,1,function(x)mapply(function(n,w)doit1(x,n,w),qq$nodes,qq$weights))
>
> Both seem to agree with your rr1.
>
>                                        - Phil Spector
>                                         Statistical Computing Facility
>                                         Department of Statistics
>                                         UC Berkeley
>                                         spector at stat.berkeley.edu
>
> On Mon, 4 Oct 2010, Doran, Harold wrote:
>
>> Sorry about that; s <- 1
>>
>> -----Original Message-----
>> From: Gabriela Cendoya [mailto:gabrielacendoya.rlist at gmail.com]
>> Sent: Monday, October 04, 2010 1:44 PM
>> To: Doran, Harold
>> Cc: R-help
>> Subject: Re: [R] Help with apply
>>
>> You are missing "s" in your definitions so I can't reproduce your code.
>>
>>> tmp <- data.frame(var1 = sample(c(0:10), 3, replace = TRUE), var2 = sample(c(0:10), 3, replace = TRUE), var3 = sample(c(0:10), 3, replace = TRUE))
>>>
>>> str(tmp)
>> 'data.frame':   3 obs. of  3 variables:
>> $ var1: int  9 3 9
>> $ var2: int  4 6 2
>> $ var3: int  2 9 3
>>>
>>> #I can run the following double loop and yield what I want in the end (rr1) as:
>>>
>>> library(statmod)
>>> Q <- 2
>>> b <- runif(3)
>>> qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1)
>>> rr1 <- matrix(0, nrow = Q, ncol = nrow(tmp))
>>> L <- nrow(tmp)
>>> for(j in 1:Q){
>> +     for(i in 1:L){
>> +         rr1[j,i] <- exp(sum(log((exp(tmp[i,]*(qq$nodes[j]-b))) /
>> (factorial(tmp[i,]) *
>> +         exp(exp(qq$nodes[j]-b)))))) * ((1/(s*sqrt(2*pi)))  *
>> exp(-((qq$nodes[j]-0)^2/(2*s^2))))/dnorm(qq$nodes[j]) * qq$weights[j]
>> +                                                }
>> +                                }
>> Error: objeto 's' no encontrado
>>> rr1
>>     [,1] [,2] [,3]
>> [1,]    0    0    0
>> [2,]    0    0    0
>>>
>>
>> Gabriela
>>
>>
>> 2010/10/4, Doran, Harold <HDoran at air.org>:
>>> Suppose I have the following data:
>>>
>>> tmp <- data.frame(var1 = sample(c(0:10), 3, replace = TRUE), var2 =
>>> sample(c(0:10), 3, replace = TRUE), var3 = sample(c(0:10), 3, replace =
>>> TRUE))
>>>
>>> I can run the following double loop and yield what I want in the end (rr1)
>>> as:
>>>
>>> library(statmod)
>>> Q <- 2
>>> b <- runif(3)
>>> qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1)
>>> rr1 <- matrix(0, nrow = Q, ncol = nrow(tmp))
>>> L <- nrow(tmp)
>>>                 for(j in 1:Q){
>>>                                                 for(i in 1:L){
>>>                                                                 rr1[j,i] <-
>>> exp(sum(log((exp(tmp[i,]*(qq$nodes[j]-b))) / (factorial(tmp[i,]) *
>>> exp(exp(qq$nodes[j]-b)))))) *
>>>
>>> ((1/(s*sqrt(2*pi)))  * exp(-((qq$nodes[j]-0)^2/(2*s^2))))/dnorm(qq$nodes[j])
>>> * qq$weights[j]
>>>                                                 }
>>>                                 }
>>> rr1
>>>
>>> But, I think this is so inefficient for large Q and nrow(tmp). The function
>>> I am looping over is:
>>>
>>> fn <- function(x, nodes, weights, s, b) {
>>>                 exp(sum(log((exp(x*(nodes-b))) / (factorial(x) *
>>> exp(exp(nodes-b)))))) *
>>>                 ((1/(s*sqrt(2*pi)))  *
>>> exp(-((nodes-0)^2/(2*s^2))))/dnorm(nodes) * weights
>>>                 }
>>>
>>> I've tried using apply in a few ways, but I can't replicate rr1 from the
>>> double loop. I can go through each value of Q one step at a time and get
>>> matching values in the rr1 table, but this would still require a loop and
>>> that I think can be avoided.
>>>
>>> apply(tmp, 1, fn, nodes = qq$nodes[1], weights = qq$weights[1], s = 1, b =
>>> b)
>>>
>>> Does anyone see an efficient way to compute rr1 without a loop?
>>>
>>> Harold
>>>
>>>> sessionInfo()
>>> R version 2.10.1 (2009-12-14)
>>> i386-pc-mingw32
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>>> States.1252    LC_MONETARY=English_United States.1252
>>> [4] LC_NUMERIC=C                           LC_TIME=English_United
>>> States.1252
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] minqa_1.1.9     Rcpp_0.8.6      MiscPsycho_1.6  statmod_1.4.6
>>> lattice_0.17-26 gdata_2.8.0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] grid_2.10.1  gtools_2.6.2 tools_2.10.1
>>>
>>>      [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>> --
>> _________________________
>> Lic. María Gabriela Cendoya
>> Magíster en Biometría
>> Profesor Adjunto
>> Facultad de Ciencias Agrarias
>> UNMdP - Argentina
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/