[R] FW: Answers to "Problem with differences between S+ and R in parsing output tables with $"

Tue Dec 10 00:35:03 CET 2002

To R-wizards:

Thank you for your quick an insightful responses.  I now have R running
a translated version of S+ code for power calculations for non-normal
data.  This is from the Michael Crawley's new book Statistical
Computing--An Introduction to Data Analysis using S-Plus. Peter Dalgaard
corrected me on the issue of the $ operator not being a parser.  It sure
looked like an advanced Regex type command to me!  Andy Liaw was there
quickly with the answer, while Ben Bolker and Douglas Bates showed the
importance of the str() function to sort out problems like these.
Thanks to all.  I will ask more questions in the future as I plunge more
and more into R.

The code set forward by M. Crawley seems to eloquently establish a
fundamental calculating engine for determining the power of a test for
non-normal data.  If one has seen how many contorted lines are required
for the same thing in programs like C++, FORTRAN, SAS or Statistica you
would appreciate my use of term "eloquence".  How-to discussions of the
calculation of Power [1 - F(-)] tends to be glossed over even in them
most advanced of statistical books.  Not so for Statistical
Computing...for that I recommend it.

The code with S+ and R variants that work is shown below.  It would be
nice if someone could come up with a version that can be used by both S+
and R without transform. The use of the statement "summary(aov(model))"
seems more universal than the statement "summary.aov(model)".  Anyone
know why?

>d<-numeric(1000)
>n<-350
>for(i in 1:1000){
> y1<-rnbinom(n,1,.3)  
> y2<-rnbinom(n,1,.25)
> y<-c(y1,y2)
> fa<-factor(c(rep(1,n),rep(2,n)))
> model<-glm(y~fa,poisson)
># S+ variants
>#d[i]<-as.vector(summary(aov(model))$"Pr(F)")[1]}    #A. Does work in
S+  
>#d[i]<-as.vector(summary.aov(model)$"Pr(F)")[1]}     #B. Does work in
S+  
>#
------------------------------------------------------------------------
># R variants
>d[i]<-as.vector(summary(aov(model))[[1]]$"Pr(>F)"[1])#C. Does work in R

>#
------------------------------------------------------------------------
>}

Essentially an F test comparing two means is been performed 1000 times,
with 1000 tables produced and only the alpha level extracted and kept.
These 1000 alphas are kept in vector d[].  The question then is asked:
How many times out of 1000 does a random sampling of two distributions
known to have different means produce a statistically significant
difference? That proportion is considered an estimate of the Power or
the test.  In this case the simulation could produce a number like 0.840
given when the alpha level of 0.05 and n1=n2=350 are set.
Ex:

> sum(d<0.05)/1000  
>[1] 0.8400

Why I call this a "Power calculating engine" is that:
1. Vectors y1 and y2 can have different n1 and n2.
2. y1 and y2 can come from ANY distribution, even complex ones
3. y1 and y2 can even come from distributions that are not easily
defined, i.e. from bootstrapping.
4. The code can be tailored to estimate Power even when multi-stage
tests are proposed.

I hope that my intuition is correct.

REX 

-----Original Message-----
From: RexBryan [mailto:rexbryan1 at attbi.com] 
Sent: Monday, December 09, 2002 11:51 AM
To: 'r-help at lists.R-project.org'
Subject: Problem with differences between S+ and R in parsing output
tables with $ 

R-wizards

I have successfully run with S+ a statistical power calculation for
non-normal distributions as presented in M. Crawley’s new book.  When I
tried the newest version of R on the same code, the $ parse statement
doesn't seem to retrieve the appropriate number from a table. Note that
some of the cosmetic differences in the two tables have to be dealt with
by the parser. Any ideas what's happening?
REX

# Begin R -------------------------------------------------------------
#
> summary(aov(model))

             	Df 	Sum Sq 	Mean Sq 	F value 	Pr(>F)
fa            	1   	11.1    	11.1  	0.9327 	0.3345
Residuals   	698 	8279.3    	11.9

# R: ... trying to parse the table gives a NULL for the probability of
F... 

> (summary(aov(model))$"Pr(>F)")[1]
NULL

# End R ---------------------------------------------------------------

# Begin S+ ------------------------------------------------------------

> summary(aov(model))

           		Df 	Sum of Sq  	Mean Sq   	F Value
Pr(F) 
fa   			1    	11.063 	11.06286 	0.9326708
0.3345044
Residuals 		698  	8279.314 	11.86148          

# S+ ... using $ to parse the table gives the right answer for the
probability of F...

(summary(aov(model))$"Pr(F)")[1]
[1] 0.3345044

# End S+ --------------------------------------------------------------