[R] Memory Utilization on R

Alekseiy Beloshitskiy abeloshitskiy at velti.com
Tue Mar 27 18:07:45 CEST 2012


Guys, let me add my 5 coins into your interesting discussion.

I have ~10Gb txt file with train data for my model. It has about 150 millions rows for 12 variables.
When I load it into memory (just run only one row!):

train<-read.table(file="/training.txt")

while loading it takes ~28Gb of RAM (It takes about 2hours to finish), and when data are loaded, rsession takes ~14Gb.
 I even can't imagine how much it will take when I will run svm train on this data set. Is there any optimization to decrease time required for loading data into memory.
I use 32RAM x64 box.

Thank you,
-Alex

________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] on behalf of Kurinji Pandiyan [kurinji.pandiyan at gmail.com]
Sent: 27 March 2012 18:14
To: R. Michael Weylandt
Cc: r-help at r-project.org
Subject: Re: [R] Memory Utilization on R

Thank you for the modified script! I have now tried on different datasets
and it works very well and is dramatically faster than my original script!

I really appreciate the help.
Kurinji

On Fri, Mar 23, 2012 at 1:33 PM, R. Michael Weylandt <
michael.weylandt at gmail.com> wrote:

> Taking a look at your script: there are a some potential optimizations
> you can do:
>
>  # Fine
> poi <- as.character(top.GSM396290) #5000 characters
> x.data <- h1[,c(1,7:9)] # 485577 obs of 4 variables
>
> # Pre-allocate the space
> x <- vector("list", 485577) # x <- list()
>
> # Do the "a" stuff once outside the loop so you aren't doing it 485577
> times
> a <- strsplit(as.character(x.data[, "UCSC_REFGENE_NAME"]), ";")
>
> # Lets use an apply statement instead of a for loop
> # vapply is the fastest since we prespecify the return type.
> x.data[vapply(a, function(x) any(poi %in% x), logical(1)), ]
>
> I think this will do what you wanted (and hopefully much faster)
>
> Note that you could probably tune this further but I think this
> strikes a good balance between clarity and performance (for now)
>
> Hope this helps,
>
> Michael
>
> On Fri, Mar 23, 2012 at 11:52 AM, Kurinji Pandiyan
> <kurinji.pandiyan at gmail.com> wrote:
> >
> > Thank you for the input.
> >
> > As it were, I realized that my script is utilizing a lot more memory than
> > I claimed - it was initially using 3 GB but has gone up to 20.24 active
> but
> > 29.63 assigned to the R session.
> >
> > The script has run overnight and now I don't think it is active anymore
> > since I keep getting the error message that I am out of startup disk
> space
> > for application memory.
> >
> > I am attaching screen shots of my RAM usage distribution (given that
> there
> > is no fluctuation in the usage by the R session I believe it is not
> running
> > anymore) and of my available HD.
> >
> >
> >
> >
> >
> > Here is my script -
> >
> > poi <- as.character(top.GSM396290) #5000 characters
> > x.data <- h1[,c(1,7:9)] # 485577 obs of 4 variables
> > head(x.data)
> >
> > x <- list()
> >
> > for(i in 1:485577){
> >  a <- as.character(x.data[i, "UCSC_REFGENE_NAME"])
> >  a <- unlist(strsplit(a, ";"))
> >  if(any(poi %in% a) == TRUE) {x[[i]] <- x.data[i,]}
> >   }
> >
> >  # this step completed in a few hours
> >
> > x <- do.call(rbind, x) # this step has been running overnight and is
> still
> > stuck
> >
> > Thanks, I really appreciate the help.
> > Kurinji
> >
> > On Thu, Mar 22, 2012 at 10:44 PM, R. Michael Weylandt
> > <michael.weylandt at gmail.com> wrote:
> >>
> >> Well... what makes you think you are hitting memory constraints then?
> >> If you have significantly less than 3GB of data, it shouldn't surprise
> >> you if R never needs more than 3GB of memory.
> >>
> >> You could just be running your scripts inefficiently...it's an extreme
> >> example, but all the memory and gigaflopping in the world can't speed
> >> this up (by much):
> >>
> >> for(i in seq_len(1e6)) Sys.sleep(10)
> >>
> >> Perhaps you should look into profiling tools or parallel
> >> computation...if you can post a representative example of your
> >> scripts, we might be able to give performance pointers.
> >>
> >> Michael
> >>
> >> On Fri, Mar 23, 2012 at 1:33 AM, Kurinji Pandiyan
> >> <kurinji.pandiyan at gmail.com> wrote:
> >> > Yes, I am.
> >> >
> >> > Thank you,
> >> > Kurinji
> >> >
> >> > On Mar 22, 2012, at 10:27 PM, "R. Michael Weylandt"
> >> > <michael.weylandt at gmail.com> wrote:
> >> >
> >> >> Use 64bit R?
> >> >>
> >> >> Michael
> >> >>
> >> >> On Thu, Mar 22, 2012 at 5:22 PM, Kurinji Pandiyan
> >> >> <kurinji.pandiyan at gmail.com> wrote:
> >> >>> Hello,
> >> >>>
> >> >>> I have a 32 GB RAM Mac Pro with a 2*2.4 GHz quad core processor and
> >> >>> 2TB
> >> >>> storage. Despite this having so much memory, I am not able to get R
> >> >>> to
> >> >>> utilize much more than 3 GBs. Some of my scripts take hours to run
> >> >>> but I
> >> >>> would think they would be much faster if more memory is utilized.
> How
> >> >>> do I
> >> >>> optimize the memory usage on R by my Mac Pro?
> >> >>>
> >> >>> Thank you!
> >> >>> Kurinji
> >> >>>
> >> >>>        [[alternative HTML version deleted]]
> >> >>>
> >> >>> ______________________________________________
> >> >>> R-help at r-project.org mailing list
> >> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >>> PLEASE do read the posting guide
> >> >>> http://www.R-project.org/posting-guide.html
> >> >>> and provide commented, minimal, self-contained, reproducible code.
> >
> >
>

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list