[R] Re: normal distribution in samples of soil organisms.

Wed Sep 3 13:19:12 CEST 2003

Hi, 

   You didn't specify the satistical model you are intereted in, I will
suppose it is something like:
#Organims ~ Landscape + Soil + Depth + Species

I suppose you have a table of with something like...

Spec	Lands	Soil	Depth	#Organisms
A	1	1	1	10
A	1	2	1	2
B	1	1	1	0
B	2	2	2	2

... etc,

Normally, the count of organisms in soil samples follows a posisson
distribution, not a normal one. You should effectively check for normality
in your final data table, including the zeroes. In fact, you can do an
anova and then do some diagnostics plots, in R you would do something like
(suppose your table is stored in an object called my data):

mydata.lm <- lm(Organisms ~ Spec + Lands + Soil + Depth, data = mydata) #
fit a linear model

anova(mydata.lm) # to print a nicely formatted anova table

plot(mydata.lm)  # to do some diagnostics plots

If your data follows a poisson distribution, the qq plot obtained above
will be strikingly deviated from a straight line. This plots are normally
more useful than just obtaining p-values for normality tests. If your data
is indeed deviated from normality, you should then apply a suitable
transformation and repeat the analysis. I strongly adivice you to have a
look at the book Biometry, by Robert R. Sokal and F. James Rohlf (1996),
there are quite good examples in there and where and what transformations
should be applied, specially trasnforming data from a poisson-like
distribution to a normal one.

Hope this helps.

acovaleda at hotpop.com wrote:
> Hi Sirs and Madams.
> 
> My question is more statistical than related with the use of R software
and I 
> hope it will not seems so silly and elemental. I'm analyzing  a set of data 
> of some soil organism collected in diferent landscapes, soils taxa, and 
> depths. The sample was performed thinking in a factorial structure with
four 
> factors: Specie, Landscape, Soil and Depth. Because not all the species 
> appear in each sample there are so many zeros in the matrix data.
> 
> Checking the normal distribution I'm not sure If I must check it in the 
> original sample data (without zeros) or in the big matrix with zeros. In
the 
> first case there is a normal distribution (W = 0,85) but in the second
it is 
> not (W = 0,45). In which data must I check distribution?, Can I proceed to 
> perform a Parametric ANOVA?.
> 
> Thanks
>