[R] t test problem?

Ramon Diaz-Uriarte rdiaz at cnio.es
Thu Sep 23 10:27:23 CEST 2004


On Wednesday 22 September 2004 13:07, Ted Harding wrote:
> On 22-Sep-04 kan Liu wrote:
> > Hi, Many thanks for your helpful comments and suggestions. The attached
> > are the data in both log10 scale and original scale. It would be very
> > grateful if you could suggest which version of test should be used.
> >
> > By the way, how to check whether the variation is additive (natural
> > scale) or multiplicative (log scale) in R? How to check whether the
> > distribution of the data is normal?
>
> As for additive vs multiplicative, this can only be judged in terms
> of the process by which the values are created in the real world.


Just my 2 cents: I often find it helpful to ask myself (or the "client") 
whether, if there was a difference ("something") between the two samples, 
I/she/he thinks the appropriate model is (please, read the "=" as "approx. 
equal")

sample.1 = sample.2 + something [1]

OR

sample.1 = sample.2 * something [2]

(i.e., the ratio of means is a constant: sample.1/sample.2 = something)

which, by log transforming becomes

log(sample.1) = log(sample.2) + log(something)

I am not including here the issue of error distribution, but often times when 
the model for the means is like [2] the error terms are multiplicative (i.e., 
additive in the log scale). At least in many biological and engineering 
problems it is often evident whether [1] or [2] should be appropriate for the 
data, given what we know about the subject.

Best,

R.

> As for normality vs non-normality, an appraisal can often be made
> simply by looking at a histogram of the data.
>
> In your case, the commands
>   hist(x,breaks=10000*(0:100))
>   hist(y,breaks=10000*(0:100))
> indicate that the distributions of x and y do not look at all
> "normal", since they both have considerable positive skewness
> (i.e. long upper tails relative to the main mass of the distribution).
>
> This does strongly suggest that a logarithmic transformation would
> give data which are more nearly normally distributed, as indeed
> is confirmed by the commands
>   hist(log(x))
>   hist(log(y))
> though in both cases the histograms show some irregularity compared
> with what you would expect from a sample from a normal distribution:
> the commands
>   hist(log(x),breaks=0.2*(40:80))
>   hist(log(y),breaks=0.2*(40:80))
> show that log(x) has an excessive peak at around 11.7,
> while log(y) has holes at around 11.1 and 12.1.
>
> Nevertheless, this inspection of the data shows that the use of
> log(x) and log(y) will come much closer to fulfilling the conditions
> of validity of the t test than using the raw data x and y.
>
> However, it is not merely the *normality* of each which is needed:
> the conditions for the usual t test also require that the two
> populations sampled for log(x) and log(y) should have the same
> standard deviations. In your case, this also turns out to be
>
> nearly enough true:
>   > sd(log(x))
>
>   [1] 0.902579
>
>   > sd(log(y))
>
>   [1] 0.9314807
>
> > PS, Can I confirm that do your suggestions mean that in order to check
> > whether there is a difference between x and y in terms of mean I need
> > check the distribution of x and that of y in both natual and log scales
> > and to see which present normal distribution?
>
> See above for an approach to this: the answer to your question is,
> in effect, "yes". It could of course have happened that neither the
> raw nor the log scale would be satisfactory, in which case you would
> need to consider other possibilities. And, if the SDs had turned out
> to be very different, you should not use the standard t test but
> a variant which is adpated to the situation (e.g. the Welch test).
>
> You can, of course, also perform formal tests for skewness, for
> normality, and for equality of variances.
>
> Best wishes,
> Ted.
>
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
> Fax-to-email: +44 (0)870 094 0861   [NB: New number!]
> Date: 22-Sep-04                                       Time: 12:07:07
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)




More information about the R-help mailing list