[R] R runtime performance and memory usage

Sasikumar Kandhasamy ckmsasi at gmail.com
Wed Nov 18 06:01:36 CET 2015


Thanks a lot, Martin and William. Looks like, we can't apply prediction on
lsfit and lm.fit objects. Because, i am trying to use lm object to predict
the values for new data frame.

Thanks & Regards
Sasi

On Tue, Nov 17, 2015 at 9:49 AM, Martin Maechler <maechler at stat.math.ethz.ch
> wrote:

> >>>>> William Dunlap <wdunlap at tibco.com>
> >>>>>     on Mon, 16 Nov 2015 16:01:42 -0800 writes:
>
>     > If a quick running time is important and your models involve only
>     > numeric data with no missing values and you are willing to spend more
>     > programming time setting things up, the lsfit() function may work
>     > better for you.
>
>     > Bill Dunlap
>     > TIBCO Software
>     > wdunlap tibco.com
>
> or even faster is the extra-simple but fast  .lm.fit() function
> (in R >= 3.1.0).
>
> I've written a small demo about it and published it here,
>    http://rpubs.com/maechler/fast_lm
>
> Martin Maechler, ETH Zurich (and R Core)
>
>
>     > On Mon, Nov 16, 2015 at 3:25 PM, Sasikumar Kandhasamy <
> ckmsasi at gmail.com> wrote:
>     >> Thanks a lot Bill & Bert.
>     >>
>     >> Hi Bill,
>     >>
>     >> Sorry i was wrong on number of records, actually, i am using two
> dimensional
>     >> data of 250K records each. And regarding CPU usage, it was the
> elapsed time.
>     >> Infact, i have pined one core to run R.
>     >>
>     >> Thanks & Regards
>     >> Sasi
>     >>
>     >> On Mon, Nov 16, 2015 at 2:04 PM, William Dunlap <wdunlap at tibco.com>
> wrote:
>     >>>
>     >>> You cannot do a linear regression with one column of data - there
> must
>     >>> be at least one response column and one predictor.  By default, lm
>     >>> throws in a constant term which gives you a second predictor.  If
> your
>     >>> predictor is categorical, you get a new column for all but the
> first
>     >>> unique value in it.
>     >>>
>     >>> lm() deals only with double precision data, at 8 bytes/number.
> Thus
>     >>> 250k numbers occupies 2 million bytes.  Your three columns (in the
>     >>> non-categorical-predictor case)  take up 6 million bytes,
>     >>>
>     >>> lm()'s output contains several columns the size of the response
>     >>> variable: residuals, effects, and fitted.values.  It also contains
> the
>     >>> QR decomposition of the design matrix (the size of all the
> predictor
>     >>> columns together).
>     >>>
>     >>> There are also some temporary variables generated in the course of
> the
>     >>> computation.
>     >>>
>     >>> So your observed 40 MB memory usage seems reasonable.
>     >>>
>     >>> Use the object.size() function to see how big objects are and
> str() to
>     >>> look at their structure.
>     >>>
>     >>> My laptop with  a 2.5 GHz Intel i7 processor takes a quarter
> second to
>     >>> fit a simple linear model with one numeric predictor and a constant
>     >>> term.  6 seconds sounds slow.  Is that cpu or elapsed time (use
>     >>> system.time() to see)?
>     >>>
>     >>>
>     >>>
>     >>> Bill Dunlap
>     >>> TIBCO Software
>     >>> wdunlap tibco.com
>     >>>
>     >>>
>     >>> On Mon, Nov 16, 2015 at 12:25 PM, Sasikumar Kandhasamy
>     >>> <ckmsasi at gmail.com> wrote:
>     >>> > Hi All,
>     >>> >
>     >>> > I have couple of clarifications on R run-time performance. I have
>     >>> > R-3.2.2
>     >>> > package compiled for MIPS64 and am running it on my linux
> machine with
>     >>> > mips64 processor (core speed 1.5GHz) and observing the following
>     >>> > behaviors,
>     >>> >
>     >>> > 1. Applying "linear regression model" (lm) on 1MB of data
> (contains 1
>     >>> > column of 250K records) takes ~6 seconds to complete. Anyidea,
> is it an
>     >>> > expected behavior or not? If not, can you please the suggestions
> or
>     >>> > options
>     >>> > to improve if we have any?
>     >>> >
>     >>> > 2. Also, the R process runtime virtual memory is increased by
> 40MB after
>     >>> > applying the linear model on 1MB data. Is it also expected
> behavior? If
>     >>> > it
>     >>> > is expected, can you please share the insight of memory usage?
>     >>> >
>     >>> > Thanks in advance.
>     >>> >
>     >>> > Regards
>     >>> > Sasi
>     >>> >
>     >>> >         [[alternative HTML version deleted]]
>     >>> >
>     >>> > ______________________________________________
>     >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
> see
>     >>> > https://stat.ethz.ch/mailman/listinfo/r-help
>     >>> > PLEASE do read the posting guide
>     >>> > http://www.R-project.org/posting-guide.html
>     >>> > and provide commented, minimal, self-contained, reproducible
> code.
>     >>
>     >>
>
>     > ______________________________________________
>     > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>     > https://stat.ethz.ch/mailman/listinfo/r-help
>     > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>     > and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list