[R] Help with speed (replacing the loop?)

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Jan 11 16:28:05 CET 2012


Hi,

On Wed, Jan 11, 2012 at 9:57 AM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
> Dear R-ers,
>
> I have a loop below that loops through my numeric variables in data
> frame x and through levels of the factor "group" and multiplies (group
> by group) the values of numeric variables in x by the corresponding
> group-specific values from data frame y. In reality, my:
> dim(x) is 300,000 rows by 100 variables, and
> dim(y) is 120 levels of "group" by 100 variables.
> So, my huge data frame x takes up a lot of space in memory. This is
> why I am actually replacing values of "a" and "b" in x with newly
> calculated values, rather than adding them.
> The code does what I need, but it takes forever.
>
> Is there maybe a more speedy way to achieve what I need?
> Thanks a lot!

Here's an all-middle-steps included way to do so using data.table. If
you use more data.table-centric idioms (using `:=` operator and other
ways to `merge`) you can likely eek out less memory and higher speed,
but I'll leave it like so for pedagogical purposes ;-)

====
library(data.table)

## your data
xx <- data.table(group=c(rep("group1",5),rep("group2",5)),
                 a=1:10, b=seq(10,100,by=10), key="group")
yy <- data.table(group=c("group1","group2"), a=c(10,20), b=c(2,3),
                 key="group")

## temp data.table to get your ducks in a row
m <- merge(xx, yy, by="group", suffixes=c(".x", ".y"))

## your answers will be in the aa and bb columns
result <- transform(m, aa=a.x * a.y, bb=b.x * b.y)

====

Truth be told, if you use normal data.frames, the code will look very
similar to above, so you can try that, too.

HTH,
-steve


-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list