[R] Applying user function over a large matrix

Sudipta Sarkar ssarkar at lanworth.com
Wed Apr 30 15:42:31 CEST 2008


Dear Folks
Thanks for all your replies and suggestions, I will be trying
out these suggestions today and let you know how it goes.
Please let me know if you can think of anything else to
resolve the issue.
Regards

---- Original message ----
>Date: Tue, 29 Apr 2008 15:43:41 -0700
>From: Bert Gunter <gunter.berton at gene.com>  
>Subject: Re: [R] Applying user function over a large matrix  
>To: "'Ray Brownrigg'" <Ray.Brownrigg at mcs.vuw.ac.nz>,
<r-help at r-project.org>
>Cc: "'Tony Plate'" <tplate at acm.org>
>
>If you can(one dimensional only), try using lowess() instead.
Probably in a
>for loop as Ray suggested.
>
>loess() is more powerful and flexible, but you pay for it in
extra
>complexity and time. Maybe in this case, it's not worth it.
>
>-- Bert Gunter
>Genentech
>
>-----Original Message-----
>From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On
>Behalf Of Ray Brownrigg
>Sent: Tuesday, April 29, 2008 3:19 PM
>To: r-help at r-project.org
>Cc: Tony Plate
>Subject: Re: [R] Applying user function over a large matrix
>
>In addition to Tony's suggestion, have a look at the
following sequence,
>which 
>I suspect is because the call to apply will duplicate your
1.5GB matrix, 
>whereas the for loop doesn't [I stand to be corrected here].
>
>> x <- matrix(runif(210000), 21)
>> unix.time({res <- numeric(ncol(x)); for(i in 1:length(res))
res[i] <- 
>sum(x[, i])})
>   user  system elapsed
>  0.079   0.000   0.079
>> unix.time(apply(x, 2, sum))
>   user  system elapsed
>   0.10    0.01    0.11
>> x <- matrix(runif(2100000), 21)
>> unix.time({res <- numeric(ncol(x)); for(i in 1:length(res))
res[i] <- 
>sum(x[, i])})
>   user  system elapsed
>  0.791   0.010   0.801
>> unix.time(apply(x, 2, sum))
>   user  system elapsed
>  1.096   0.011   1.107
>> x <- matrix(runif(21000000), 21)
>> unix.time({res <- numeric(ncol(x)); for(i in 1:length(res))
res[i] <- 
>sum(x[, i])})
>   user  system elapsed
>  7.825   0.011   7.840
>> unix.time(apply(x, 2, sum))
>   user  system elapsed
> 15.431   0.142  15.592
>> 
>
>Also, preliminary checking using the top utility shows the
for loop requires
>
>just over half the memory of the apply() call.  This is on a
NetBSD system 
>with 2GB memory.
>
>HTH,
>Ray Brownrigg
>
>On Wed, 30 Apr 2008, Tony Plate wrote:
>> It's quite possible that much of the time spent in loess()
is setting up
>> the data (i.e., the formula, terms, model.frame, etc.), and
that much of
>> that is repeated identically for each call to loess().  I
would suggest
>> looking at the code of loess() and work out what arguments
it is calling
>> simpleLoess() with, and then try calling
stats:::simpleLoess() directly. 
>> (Of course you have to be careful with this because this is
not using the
>> published API).
>>
>> -- Tony Plate
>>
>> Sudipta Sarkar wrote:
>> > Respected R experts,
>> > I am trying to apply a user function that basically calls and
>> > applies the R loess function from stat package over each time
>> > series. I have a large matrix of size 21 X 9000000 and I need
>> > to apply the loess for each column and hence I have
>> > implemented this separate user function that applies loess
>> > over each column and I am calling this function foo as
follows:
>> > xc<-apply(t,2,foo) where t is my 21 X 9000000 matrix and
>> > loess. This is turning out to be a very slow process and I
>> > need to repeat this step for 25-30 such large matrix chunks.
>> > Is there any trick I can use to make this work faster?
>> > Any help will be deeply appreciated.
>> > Regards
>> >
>> >
>> > Sudipta Sarkar PhD
>> > Senior Analyst/Scientist
>> > Lanworth Inc. (Formerly Forest One Inc.)
>> > 300 Park Blvd., Ste 425
>> > Itasca, IL
>> > Ph: 630-250-0468
>> >
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible
code.
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible
code.


Sudipta Sarkar PhD
Senior Analyst/Scientist
Lanworth Inc. (Formerly Forest One Inc.)
300 Park Blvd., Ste 425
Itasca, IL
Ph: 630-250-0468



More information about the R-help mailing list