[Rd] [R] Increasing number of observations worsen the regression model

peter dalgaard pd@|gd @end|ng |rom gm@||@com
Mon May 27 11:31:41 CEST 2019

Yes, it is important that it only happens with certan BLAS, so probably not really an R issue. 
However, there has been some concern over the C/Fortran interfaces lately, so if you could narrow it down to a specific BLAS routine, it could prove useful for the developers.

One fairly easy thing to do would be to find the breakdown point. I speculate that it could be at 16384 (=2^14) and that some sort of endianness or integer width declaration is the cause. (It would in turn suggest that MKL is using 16-bit integers somehow, which doesn't really seem credible, but you never know.)

I'm moving this to the r-devel list. It certainly is not for r-help.


> On 27 May 2019, at 10:47 , Ivan Krylov <krylov.r00t using gmail.com> wrote:
> On Sat, 25 May 2019 14:38:07 +0200
> Raffa <raffamaiden using gmail.com> wrote:
>> I have tried to ask for example in CrossValidated 
>> <https://stats.stackexchange.com/questions/410050/increasing-number-of-observations-worsen-the-regression-model> 
>> but the code works for them. Any help?
> In the comments you note that the problem went away after you replaced
> Intel MKL with OpenBLAS. This is important.
> The code that fits linear models in R is somewhat complex[*]; if
> you want to get to the bottom of the problem, you may have to take
> parts of it and feed them differently-sized linear regression problems
> until you narrow it down to a specific set of calls to BLAS or LAPACK
> functions which Intel MKL provides.
> One option would be to ask at Intel MKL forums[**].
> -- 
> Best regards,
> Ivan
> [*]
> https://madrury.github.io/jekyll/update/statistics/2016/07/20/lm-in-R.html
> [**] https://software.intel.com/en-us/forums/intel-math-kernel-library/
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com

More information about the R-devel mailing list