[R] HELP! Excel and R give me totally different regression results using the exact same data

David Winsemius dwinsemius at comcast.net
Wed Nov 7 19:11:33 CET 2012


On Nov 7, 2012, at 8:53 AM, frauke wrote:

> Hi David, hi Rui,
> 
> thanks for your quick replies. I have replicated David's R results and
> confirmed them with Minitab. Though I'm not sure what you are trying to tell
> me with the code you wrote, David. Do you mean, I should use a dataframe
> rather than a matrix, or use the "data=" part of the lm() function?

What I thought I was demonstrating was:

a) we had different degrees of freedom suggesting that you made a major error in data preparation. I offered my hypothesis for how this happened, but since you failed to provide the requested code showing your data input steps, it was only a guess.

b) I used a dataframe because that is the simplest way of preparing data for presentation to lm().

You can use a matrix to store data and present to lm(), but storing character representations of numeric data in a matrix ( as you appeared to be attempting) seems just plain ...  wrong.

> 
> Rui seems to be right, too. Excel's regression function doesn't work; I
> cannot replicate the Minitab and R results with it. According to the
> Microsoft website this is probably because the x- and y-values overlap. I am
> truly astonished that such a major bug doesn't at least have a major red
> flag to it. 

Many people are astonished, incredulous, aghast, astounded, (what is the right adjective?) that MS has allowed many errors to persist despite negative reviews by statisticians and mathematicians for decades.

On the other hand my reading of the commentary suggest a different interpretation of the error conditions. MS says:

"Case 1: The x-value and y-value ranges overlap

If the x-value and y-value ranges overlap, the LINEST worksheet function produces incorrect values in all result cells. Normal statistical probability disallows the values in the x and y ranges to overlap (duplicate each other). Do not overlap the x- and y-value ranges when referencing cells in the formula."

I think that mean not that the mathematical ranges overlap but rather that the error occurs when the spreadsheet ranges overlap.


-- 

David Winsemius, MD
Alameda, CA, USA




More information about the R-help mailing list