[R] Memory Utilization on R

R. Michael Weylandt michael.weylandt at gmail.com
Tue Mar 27 18:15:58 CEST 2012


Note that you can actually drop the line defining the big list "x". I
thought it would be needed, but it turns out to be unnecessary after
cleaning up the second half: cutting off that allocation might save
you even more time.

Best,
Michael

On Tue, Mar 27, 2012 at 11:14 AM, Kurinji Pandiyan
<kurinji.pandiyan at gmail.com> wrote:
> Thank you for the modified script! I have now tried on different datasets
> and it works very well and is dramatically faster than my original script!
>
> I really appreciate the help.
> Kurinji
>
> On Fri, Mar 23, 2012 at 1:33 PM, R. Michael Weylandt
> <michael.weylandt at gmail.com> wrote:
>>
>> Taking a look at your script: there are a some potential optimizations
>> you can do:
>>
>>  # Fine
>> poi <- as.character(top.GSM396290) #5000 characters
>> x.data <- h1[,c(1,7:9)] # 485577 obs of 4 variables
>>
>> # Pre-allocate the space
>> x <- vector("list", 485577) # x <- list()
>>
>> # Do the "a" stuff once outside the loop so you aren't doing it 485577
>> times
>> a <- strsplit(as.character(x.data[, "UCSC_REFGENE_NAME"]), ";")
>>
>> # Lets use an apply statement instead of a for loop
>> # vapply is the fastest since we prespecify the return type.
>> x.data[vapply(a, function(x) any(poi %in% x), logical(1)), ]
>>
>> I think this will do what you wanted (and hopefully much faster)
>>
>> Note that you could probably tune this further but I think this
>> strikes a good balance between clarity and performance (for now)
>>
>> Hope this helps,
>>
>> Michael
>>
>> On Fri, Mar 23, 2012 at 11:52 AM, Kurinji Pandiyan
>> <kurinji.pandiyan at gmail.com> wrote:
>> >
>> > Thank you for the input.
>> >
>> > As it were, I realized that my script is utilizing a lot more memory
>> > than
>> > I claimed - it was initially using 3 GB but has gone up to 20.24 active
>> > but
>> > 29.63 assigned to the R session.
>> >
>> > The script has run overnight and now I don't think it is active anymore
>> > since I keep getting the error message that I am out of startup disk
>> > space
>> > for application memory.
>> >
>> > I am attaching screen shots of my RAM usage distribution (given that
>> > there
>> > is no fluctuation in the usage by the R session I believe it is not
>> > running
>> > anymore) and of my available HD.
>> >
>> >
>> >
>> >
>> >
>> > Here is my script -
>> >
>> > poi <- as.character(top.GSM396290) #5000 characters
>> > x.data <- h1[,c(1,7:9)] # 485577 obs of 4 variables
>> > head(x.data)
>> >
>> > x <- list()
>> >
>> > for(i in 1:485577){
>> >  a <- as.character(x.data[i, "UCSC_REFGENE_NAME"])
>> >  a <- unlist(strsplit(a, ";"))
>> >  if(any(poi %in% a) == TRUE) {x[[i]] <- x.data[i,]}
>> >   }
>> >
>> >  # this step completed in a few hours
>> >
>> > x <- do.call(rbind, x) # this step has been running overnight and is
>> > still
>> > stuck
>> >
>> > Thanks, I really appreciate the help.
>> > Kurinji
>> >
>> > On Thu, Mar 22, 2012 at 10:44 PM, R. Michael Weylandt
>> > <michael.weylandt at gmail.com> wrote:
>> >>
>> >> Well... what makes you think you are hitting memory constraints then?
>> >> If you have significantly less than 3GB of data, it shouldn't surprise
>> >> you if R never needs more than 3GB of memory.
>> >>
>> >> You could just be running your scripts inefficiently...it's an extreme
>> >> example, but all the memory and gigaflopping in the world can't speed
>> >> this up (by much):
>> >>
>> >> for(i in seq_len(1e6)) Sys.sleep(10)
>> >>
>> >> Perhaps you should look into profiling tools or parallel
>> >> computation...if you can post a representative example of your
>> >> scripts, we might be able to give performance pointers.
>> >>
>> >> Michael
>> >>
>> >> On Fri, Mar 23, 2012 at 1:33 AM, Kurinji Pandiyan
>> >> <kurinji.pandiyan at gmail.com> wrote:
>> >> > Yes, I am.
>> >> >
>> >> > Thank you,
>> >> > Kurinji
>> >> >
>> >> > On Mar 22, 2012, at 10:27 PM, "R. Michael Weylandt"
>> >> > <michael.weylandt at gmail.com> wrote:
>> >> >
>> >> >> Use 64bit R?
>> >> >>
>> >> >> Michael
>> >> >>
>> >> >> On Thu, Mar 22, 2012 at 5:22 PM, Kurinji Pandiyan
>> >> >> <kurinji.pandiyan at gmail.com> wrote:
>> >> >>> Hello,
>> >> >>>
>> >> >>> I have a 32 GB RAM Mac Pro with a 2*2.4 GHz quad core processor and
>> >> >>> 2TB
>> >> >>> storage. Despite this having so much memory, I am not able to get R
>> >> >>> to
>> >> >>> utilize much more than 3 GBs. Some of my scripts take hours to run
>> >> >>> but I
>> >> >>> would think they would be much faster if more memory is utilized.
>> >> >>> How
>> >> >>> do I
>> >> >>> optimize the memory usage on R by my Mac Pro?
>> >> >>>
>> >> >>> Thank you!
>> >> >>> Kurinji
>> >> >>>
>> >> >>>        [[alternative HTML version deleted]]
>> >> >>>
>> >> >>> ______________________________________________
>> >> >>> R-help at r-project.org mailing list
>> >> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >>> PLEASE do read the posting guide
>> >> >>> http://www.R-project.org/posting-guide.html
>> >> >>> and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>
>



More information about the R-help mailing list