[R] Significance test

Fri Sep 23 15:05:55 CEST 2011

Yuta,

Thanks for the response.

Yuta wrote:
> 
> You've got to state the problem little bit more clear.
> 
> What do you mean by "set"? Is it a list of certain possible values,
> available as outcomes of each single measurement (variate)? Or is it
> something else?
> How many variates do you have inside each sample?
> What is it exactly that you want to find? 

Sorry, I should have been more clear. My team is working on a software
system. This system comes with a set of benchmarks that exercise specific
functionality. I am attempting to measure the performance impact of the
changes made my my team. 

Each of the samples in my previous post represents a particular "build" of
this software system and corresponding to it there are five measurements of
a benchmark execution (each benchmark is executed five times for each
build). 

Each measurement is time in seconds, so there isn't a list of all possible
values as such. However, it seems that for specific benchmarks, the
execution times seem to vary by at least some minimal amount (4.17e-07 for
the samples i've posted), so the distribution of the measurements is
essentially becoming discrete.

Yuta wrote:
> Do you want just to compare sample #1 and #2?
I want to be able to compare any pair of samples (that is, "builds"). 

Yuta wrote:
>  There seems to be not enough variates for reliable result.
Yes, unfortunately, the full set of benchmarks takes a while to run, and
this ties up resources, etc. So the number of variates available for a
particular build is limited. 

Yuta wrote:
>  Still, you may want to look at central tendencies (mean, median), i.e.
> location shift of samples, homogeneity of their variances, or the overall
> shape of empirical distributions.
Yes, I'm basically looking at the difference between the means of the five
runs  between two samples. But I need an indicator of whether the difference
is significant. At the moment, I'm doing a t-test, and that sort-of works,
but from the results I'm getting, I'm not sure how accurate it is, so I've
started to wonder if I'm doing something wrong.

Yuta wrote:
>  If your data are NOT normally distributed
The way the benchmarks are calculated, each measurement itself is a mean. I
believe the mean of the five means should be normally distributed (at least,
if they weren't "discrete-ized", as described above)? I guess, the crux of
my question is -- does the t-test apply in this case, or should I be doing
something else?

Yuta wrote:
> All in all it seems like you need to consult some statistical textbook = )
> Socal and Rolf is a good choice 
Yes, it seems so. Thanks for the recommendation. Looks like I'll be stopping
by the book shop on the way home this evening :).

Regards,
setro

--
View this message in context: http://r.789695.n4.nabble.com/Significance-test-tp3836155p3836770.html
Sent from the R help mailing list archive at Nabble.com.