[R] "lean and mean" regression {was "Memory size"}

Spencer Graves spencer.graves at pdf.com
Mon Jul 14 17:08:42 CEST 2003

Dear Silika:

	  Do you know what makes the memory requirements so large?  Do you have 
many observations, or is it (as Martin just suggested) "several factors 
with many levels"?  If the latter, and if you have not already done 
this, I suggest you think very carefully if you want all those 
(unordered) level.  If you have many levels with only one observation 
per level, then I suggest you first just delete those observations. 
You'd get residuals == 0 for those observations, anyway, and you can 
just as well handle that part of the problem manually.  If you have many 
levels with more than 1 but still very few observations per level, 
appropriate preparation for the regression might be to convert unordered 
factor levels to an ordinal scale then to numerics and than regress on a 
low-order polynomial in the made-up scale.  That's old technology but 
can still be quite useful.

	  In some applications, science progresses like this:  Unordered 
categories get ordered then transformed to an ordinal scale, then to a 
quantitative scale.  Checking for outliers might reveal misplaced levels.

hope this helps.  spencer graves

Martin Maechler wrote:
>>>>>>"AndyL" == Liaw, Andy <andy_liaw at merck.com>
>>>>>>    on Mon, 14 Jul 2003 09:33:31 -0400 writes:
>     AndyL> How *exactly* did you "run the regression" in R?
>     AndyL> There are several ways, and it can make a big
>     AndyL> difference for large data sets.  lm() would be the
>     AndyL> most expensive option.  If I'm not mistaken, lsfit()
>     AndyL> is more "lean and mean".  
> as a matter of fact, rather use  lm.fit() which is the ``work horse'' 
> of lm().  lm.fit() and lsfit() are very similar (relying on the
> same Fortran QR decomposition, but lm.fit() has now been tested 
> {by lm() usage} much more extensively.
>     AndyL> You can even do it more or less by hand, by calling
>     AndyL> qr() directly.  There's also a disussion in Venables
>     AndyL> & Ripley's "S Programming" on this subject (for Splus).
> Section 7.2, (actually the relevant code is not at all S-plus specific,
> 	      just the final "resources(.)" [CPU, Memory]
> 	      measuring of the solution.)
> It's for the case of one factor with many (107!)levels and continuous
> covariates otherwise. There, one can solve without constructing
> the large matrices that all of lm(), lsfit() or lm.fit() would use.
> It becomes really "interesting" if you have (several) factors
> with (many) levels...
> Regards,
> Martin
>     >> -----Original Message----- From: Silika Tereshchenko
>     >> [mailto:silika at access.unizh.ch] Sent: Sunday, July 13,
>     >> 2003 8:55 AM To: R-help at stat.math.ethz.ch Subject: [R]
>     >> Memory size
>     >> 
>     >> 
>     >> 
>     >> Daer all,
>     >> 
>     >> I have the problem. I could not run the regression,
>     >> because I have always the warning message
>     >> "memory.size". from the help file I learned that it is
>     >> possible to increase the memory size, but I did not
>     >> undestand how could I do it. Could you please explaine it
>     >> to me. I would be very grateful for it.
>     >> 
>     >> 
>     >> The second question: I obtained from the regression the
>     >> coefficient "6.003e-3" and "0.0345e+3". What daos it
>     >> mean?
>     >> 
>     >> 
>     >> 
>     >> Thanks a lot, Silika
>     >> 
>     >> ______________________________________________
>     >> R-help at stat.math.ethz.ch mailing list
>     >> https://www.stat.math.ethz.ch/mailman/listinfo> /r-help
>     >> 
>     AndyL> ------------------------------------------------------------------------------
>     AndyL> Notice: This e-mail message, together with any
>     AndyL> attachments, ...{{dropped}}
>     AndyL> ______________________________________________
>     AndyL> R-help at stat.math.ethz.ch mailing list
>     AndyL> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help

More information about the R-help mailing list