[R] Help on choosing the appropriate analysis method

Juhász Péter peter.juhasz83 at gmail.com
Sun Oct 17 10:26:46 CEST 2010


Dear R-help,

I'd like ask for your opinion on choosing the "right" strategy for a
particular dataset.

We conducted 24-hour electric field measurements on 90 subjects. They
are grouped by job (2 categories) and location (3 categories). There are
four exposure metrics assigned to each subject. 

An excerpt from the data:

n	job	location	M	OA	UE	all
0	job1	dist_200	0.297	0.072	0.171	0.297
1	job1	dist_200	0.083	0.529	0.066	0.529
2	job1	dist_200	0.105	0.145	1.072	1.072
3	job1	dist_200	0.096	0.431	0.099	0.431
4	job1	dist_200	0.137	0.077	0.092	0.137
5	job1	dist_20	NA	0.296	0.107	0.296
6	job1	dist_200	NA	1.595	0.293	1.595
7	job1	dist_20	NA	0.085	0.076	0.085
8	job1	dist_20	NA	2.120	0.319	2.120
9	job1	dist_20	NA	0.881	NA	0.881
10	job1	dist_0	NA	0.221	NA	0.221
80	job2	dist_20	0.800	0.342	1.482	1.482
81	job2	dist_20	NA	0.521	0.050	0.521
82	job2	dist_200	NA	0.497	0.502	0.502
83	job2	dist_200	NA	2.777	NA	2.777
84	job2	dist_20	NA	0.127	0.050	0.127
85	job2	dist_200	NA	2.508	0.423	2.508
86	job2	dist_200	0.216	0.350	2.782	2.782
87	job2	dist_200	NA	2.777	1.996	2.777
88	job2	dist_200	2.348	0.890	2.777	2.777
89	job2	dist_200	NA	0.488	NA	0.488

I'd like to know whether the differences between the group means are
significant. Is a pairwise t-test (for location, and a simple t-test for
job) appropriate in this case?

data = read.table("data.txt", header=T, nrows=90)
attach(data)
res1 = pairwise.t.test(all, location, p.adj="bonf")
print(res1)
res2 = pairwise.t.test(M, location, p.adj="bonf")
print(res2)
res3 = pairwise.t.test(OA, location, p.adj="bonf")
print(res3)
res4 = pairwise.t.test(UE, location, p.adj="bonf")
print(res4)
res1 = t.test(all~job)
print(res1)
res2 = t.test(M~job)
print(res2)
res3 = t.test(OA~job)
print(res3)
res4 = t.test(UE~job)
print(res4)

I'd also like to compare the four exposure metrics - how to do that?

One potential problem is that the distribution is not normal for any of
the exposure metrics: it's close to lognormal. (In fact, it's even worse
than that: the measuring instrument has a relatively high lower
detection limit, and all off-scale low points are marked as the det.
limit. In other words, non-detects are censored.)
Doesn't this make t-tests useless?

Thank you in advance:

Péter Juhász



More information about the R-help mailing list