[R] Generating summary statistics and simple statistical analysis from my data-set: how can I automate the analysis?

dereksloan djsloan at liv.ac.uk
Tue May 3 16:06:13 CEST 2011


I am fairly new to R and have a (for me) slightly complicated set of data to
analyse. It contains several continuous and categorical variables for a
group of individuals – e.g;

ID	Sex	Age	Familysize	Phone	Education
1	M	23	3	        Yes	        Primary
2	F	25	4	        Yes	        Secondary
3	M	33	5	        No	        Tertiary
4	F	45	1	        Yes	        Secondary
5	F	67	10	        Yes	        Secondary


I want to summarise it in a table as follows;

                 All individuals	Male	                 Female	         
Comparison between sexes 
                                                                                         
(I want to put p-values in this column) 
Age	        Median (range)	Median (range)	 Median (range)	   Wilcoxon rank
sum test
Family size	Median (range)	Median (range)	 Median (range)	   Wilcoxon rank
sum test
Phone	Number Yes (%)	Number Yes (%)	 Number Yes (%)	   Chi-squared test
Education                                                                             
Chi-squared test
Primary      Number (%)         Number (%)         Number (%)        
Secondary  Number (%)        Number (%)          Number (%)
Tertiary	Number (%)        Number (%)          Number (%)


How can I use R to do this?
For the continuous variables I know I can write code like;
summary(Age)
by(Age,data["Sex"],summary)
wilcox.test(Age~Sex)
summary(Familysize)
by(Familysize,data[“Sex”],summary)
Wilcox.test(Familysize~Sex)

but is there any way of automating/looping the analysis so that I get
summaries and comparative statistical analysis of all of the continuous
variables in a single command? I’m sure this could be done by some kind of
‘looping’ given that the analysis is always the same. Presumably I then
still have to copy the output of interest (medians, ranges, p-values) into
the summary table manually?

For each categorical variable I have really cumbersome code from which I can
extract the information I need from each variable for the summary table–
e.g,

tphone<-xtabs(~Phone+Sex,data=data)
N<-margin.table(tphone,2)
tphone1<-rbind(tphone,N)
Total<-margin.table(tphone1,1)
tphone1<-cbind(tfbise3xul1,Total)
tphone1<-t(tphone1)
tphone1<-as.data.frame(tphone1)
tphone2<-within(tphone1,{
per.No<-100*(No/N)
per.Yes<-100*(Yes/N)
tphone2<-tphone2[,c(3,2,4,1,5)]
tphone2
chisq.test(tphone)

but there must be better ways of generating the counts, percentages, and
simple statistical analysis  which I need. Again, can I loop it to do all of
my categorical variables at once?

Obviously my dataset has more continuous and categorical variables than
those shown above but I’ve abbreviated it for simplicity of explanation – I
need to write simpler/looped code so that the whole thing is not crazily
long-winded. 

Sorry that my approach so far is so bad and long-winded! R is a long uphill
curve to start with, so I’m be very grateful for any help I can get from
anyone who won’t laugh at me.

Derek


--
View this message in context: http://r.789695.n4.nabble.com/Generating-summary-statistics-and-simple-statistical-analysis-from-my-data-set-how-can-I-automate-th-tp3492537p3492537.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list