[R] speeding up a loop

jim holtman jholtman at gmail.com
Fri Oct 18 21:49:17 CEST 2013


When the system locks up, what do  you see in the Task Manager?  Is it
consuming CPU and memory?  On the example data you sent, you won't get
a match on the time since there is not match for the first entry in
df1 in the 'b' dataframe.  This leads to an error that you are not
checking for.  Have you tried it with a small subset to see if it
locks up in the same way.  Put a counter in the look that every 'n'
iteration the value of 'i' is printed out.  May sure you have
'flush.console()' after the print statement to ensure it gets to the
GUI even if you have the writes buffered.  You should be able to debug
with some of these pointers.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Oct 18, 2013 at 1:07 PM, Ye Lin <yelin at lbl.gov> wrote:
> Thanks for your advice Jim!
>
> I tried Rprof but since the code just freezes the system, I am not able to
> get results so far as I had to close R after waiting for a long time. I am
> confused that the same code would work differently on the same system.
>
> I tried out foreach package as well but didnt notice significant
> improvement. Is it that my code is not efficient or there is sth wrong or
> sth has changed with my system?
>
> Thanks!
>
>
>
> On Fri, Oct 18, 2013 at 7:14 AM, jim holtman <jholtman at gmail.com> wrote:
>>
>> You might want to use the profiler (Rprof) on a subset of your code to
>> see where time is being spent.  Find a subet that runs for a minute,
>> or so, and enable profiling for the test.  Take a look and see which
>> functions are taking the time. This will be a start.  You can also
>> watch the task monitor while the application is running to see how
>> fast it is using the CPU and memory.  If you are going around a loop a
>> number of times, you can put some monitoring 'cat' statements that
>> will periodically print out the memory and CPU used.  So these are
>> some of the techniques to start looking at things in your program.
>> Also data.frames are very costly to 'index' into.  You might want to
>> consider converting to a matrix (where possible since all columns have
>> to have the same mode).  This can provide significant improvement.
>> This is something that you will be able to see when you use the
>> profiling tool since it will probably show a lot of time in the
>> functions that handle dataframes.
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>>
>> On Fri, Oct 18, 2013 at 9:23 AM, Ye Lin <yelin at lbl.gov> wrote:
>> > Thanks for your help David!
>> >
>> > I was running the same code the other day and it worked fine although it
>> > took a while as well. You are right that dff shud be df1 and maybe it's
>> > a
>> > portion of my data so it have an error of length =0.
>> >
>> > About CPU usage, I got it by clicking ctrl+alt+delete and it showed CPU
>> > usage is really high. Is there anyway to figure out why R is taxing my
>> > system?
>> >
>> > Thanks!
>> >
>> > Ye
>> >
>> > On Thursday, October 17, 2013, David Winsemius wrote:
>> >
>> >>
>> >> On Oct 17, 2013, at 2:56 PM, Ye Lin wrote:
>> >>
>> >> > Hey R professionals,
>> >> >
>> >> > I have a large dataset and I want to run a loop on it basically
>> >> > creating
>> >> a
>> >> > new column which gathers information from another reference table.
>> >> >
>> >> > When I run the code, R just freezes and even does not response after
>> >> 30min
>> >> > which is really unusual. I tried sapply as well but does not improve
>> >> > at
>> >> > all.
>> >> >
>> >> > I am running R 3.0.2 on Windows 7.  I checked the system, when I run
>> >> > the
>> >> > code, my CPU usage is about 25%-30% that is taxing my desktop.
>> >>
>> >> A guess: It's not your CPU use ... it's your RAM use. You've probably
>> >> exhausted your RAM and your system has paged out to virutla memory
>> >> >
>> >> > Here is my code:
>> >> >
>> >> > #df1 is the data set I want to add a new column#
>> >> > #b is the reference tabel#
>> >> >
>> >> > for (i in (1:nrow(df1))) {
>> >> >  begin=which(b$Time2==df1$start[i] & b$Date==df1$Date[i])
>> >> >  date=unlist(strsplit(as.character(dff$end[i])," "))[1]
>> >> >   end=ifelse(date=="2013-10-17",
>> >> >   which(b$Time2==df1$end[i] & b$Date==df1$Date[i]),
>> >> >   which(b$Time2==df1$end[i]-3600*24 &
>> >> > b$Date==as.Date(df1$Date[i])+1))
>> >> >    df1$new[i] <- sum(b[begin:end,]$Power)
>> >> > }
>> >> >
>> >>
>> >> I get:
>> >> Error in strsplit(as.character(dff$end[i]), " ") : object 'dff' not
>> >> found
>> >>
>> >> If I change the dff to df1, I get:
>> >> Error in begin:end : argument of length 0
>> >>
>> >> --
>> >> David.
>> >> > And here is a mimic sample of df1 & b:
>> >> >
>> >> > df1 <- structure(list(Date = structure(c(1369699200, 1369699200,
>> >> > 1369699200,
>> >> > 1369699200, 1369699200), tzone = "UTC", class = c("POSIXct",
>> >> > "POSIXt")), start = structure(c(1381991205, 1381990247, 1382010454,
>> >> > 1382007281, 1381992288), tzone = "UTC", class = c("POSIXct",
>> >> > "POSIXt")), end = structure(c(1381992405, 1381993727, 1382010694,
>> >> > 1382007461, 1381992468), tzone = "UTC", class = c("POSIXct",
>> >> > "POSIXt"))), .Names = c("Date", "start", "end"), row.names = c(NA,
>> >> > -5L), class = "data.frame")
>> >> >
>> >> >
>> >> > b <- structure(list(Date = structure(c(1369699200, 1369699200,
>> >> 1369699200,
>> >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
>> >> > 1369699200,
>> >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
>> >> > 1369699200,
>> >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
>> >> > 1369699200,
>> >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
>> >> > 1369699200,
>> >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
>> >> > 1369699200,
>> >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
>> >> > 1369699200,
>> >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200,
>> >> > 1369699200,
>> >> > 1369699200, 1369699200, 1369699200, 1369699200, 1369699200), tzone =
>> >> "UTC",
>> >> > class = c("POSIXct",
>> >> > "POSIXt")), Time2 = structure(c(1381989634, 1381989694, 1381989754,
>> >> > 1381989814, 1381989874, 1381989934, 1381989994, 1381990054,
>> >> > 1381990114,
>> >> > 1381990174, 1381990234, 1381990294, 1381990354, 1381990414,
>> >> > 1381990474,
>> >> > 1381990534, 1381990594, 1381990654, 1381990714, 1381990774,
>> >> > 1381990834,
>> >> > 1381990894, 1381990954, 1381991014, 1381991074, 1381991134,
>> >> > 1381991194,
>> >> > 1381991254, 1381991314, 1381991374, 1381991434, 1381991494,
>> >> > 1381991554,
>> >> > 1381991614, 1381991674, 1381991734, 1381991794, 1381991854,
>> >> > 1381991914,
>> >> > 1381991974, 1381992034, 1381992094, 1381992154, 1381992214,
>> >> > 1381992274,
>> >> > 1381992334, 1381992394, 1381992454, 1381992514, 1381992574), tzone =
>> >> "UTC",
>> >> > class = c("POSIXct",
>> >> > "POSIXt")), Power = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
>> >> > 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
>> >> > 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
>> >> > 45, 46, 47, 48, 49, 50)), .Names = c("Date", "Time2", "Power"
>> >> > ), row.names = c(NA, -50L), class = "data.frame")
>> >> >
>> >> > Thanks for your help!
>> >> >
>> >> >       [[alternative HTML version deleted]]
>> >> >
>> >> > ______________________________________________
>> >> > R-help at r-project.org <javascript:;> mailing list
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> > and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >> David Winsemius
>> >> Alameda, CA, USA
>> >>
>> >>
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list