[Rd] R and Gnumeric

Jean Bréfort jean.brefort at normalesup.org
Sun Jun 8 13:27:34 CEST 2008


Hi,

I just read the "Embedding R in Gnumeric" idea at
http://www.r-project.org/SoC08/ideas.html. On my side, I intend to add
as many statistics related plot types to the current gnumeric charting
engine as possible. We already have boxplots and partial support for
histograms. My immediate plans are to finish the histogram code and add
probability plots (http://bugzilla.gnome.org/show_bug.cgi?id=500168)
during the summer if time permits (importing some code from R).
For the future, I see two options: either add all necessary plot types
to the gnumeric charting engine or embedding R charts directly using
either a new SheetObject class or the goffice component system (which
would allow inserting these charts in abiword as well).

One other totally unrelated thing. We got recently a bug report about an
incorrect R squared in gnumeric regression code
(http://bugzilla.gnome.org/show_bug.cgi?id=534659). R (version 2.7.0)
give the same result as Gnumeric as can be seen below:

> mydata <- read.csv(file="data.csv",sep=",")
> mydata
  X  Y
1 1  2
2 2  4
3 3  5
4 4  8
5 5  0
6 6  7
7 7  8
8 8  9
9 9 10
> summary(lm(mydata$Y~mydata$X))

Call:
lm(formula = mydata$Y ~ mydata$X)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.8889  0.2444  0.5111  0.7111  2.9778 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)   1.5556     1.8587   0.837   0.4303  
mydata$X      0.8667     0.3303   2.624   0.0342 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 2.559 on 7 degrees of freedom
Multiple R-squared: 0.4958,	Adjusted R-squared: 0.4238 
F-statistic: 6.885 on 1 and 7 DF,  p-value: 0.03422 

> summary(lm(mydata$Y~mydata$X-1))

Call:
lm(formula = mydata$Y ~ mydata$X - 1)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.5614  0.1018  0.3263  1.6632  3.5509 

Coefficients:
         Estimate Std. Error t value Pr(>|t|)    
mydata$X   1.1123     0.1487   7.481 7.06e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 2.51 on 8 degrees of freedom
Multiple R-squared: 0.8749,	Adjusted R-squared: 0.8593 
F-statistic: 55.96 on 1 and 8 DF,  p-value: 7.056e-05 

I am unable to figure out what this 0.8749 value might represent. If it
is intended to be the Pearson moment, it should be 0.4958, and if it is
the coefficient of determination, I think the correct value would be
0.4454 as given by Excel. It's of course nice to have the same result in
R and Gnumeric,but it would be better if this result was accurate (if it
is, we need some documentation fix). Btw, I am not a statistics expert
at all.

Best regards,
Jean Brefort



More information about the R-devel mailing list