[R] 'mean' and 'sd' calculations do not match

Peter Dalgaard p.dalgaard at biostat.ku.dk
Thu Dec 8 14:43:58 CET 2005


Ulrich Leopold <uleopold at science.uva.nl> writes:

> Dear list,
> 
> I am using R 2.1.1 on a Fedora 3 Linux, 32 bit PC.
> 
> If I compute the aggregated mean and the standard deviation I get
> standard deviation values for factors where the mean was not computed.
> It seems to me that this is somehow related to the NA values. But I
> don't quite understand what is going wrong?

You're using na.rm=TRUE on the sd calculation, but not on the means!
(The NA's generated for sd are likely groups with only one observation).
 
> Could it be related to the data import already? Some of the imported
> data got the character strings NA and others <NA>. But they are defined
> from the same values, -9999.  

No. It signifies a problem, but not this one. The <NA> is used for
factor and character columns. Most likely (can't think of any other
reason) some of your data are not numeric - "," instead of "." and
similar typos will do that to you.

> I used the code below. Below the code are parts of the results.
> 
> Cheers, Ulrich
> 
> Data import:
> 
> chemicS <- read.table("ChemieUlli_4_Quellen.csv", header = TRUE, sep =
> ",",na.strings = "-9999")
> 
> Count EC        NO3    NO2    NH4
> 3504  630.0000  33.00  0.001  0.01 
> 3505        NA  26.66   <NA>  <NA> 
> 3506        NA   0.72   <NA>  <NA> 
> 3507        NA     NA   <NA>  <NA> 
> 3508        NA     NA   <NA>  <NA> 
> 3509        NA     NA   <NA>  <NA> 
> 3510 1210.0000  14.00  0.001  0.01 
> 3511 1265.0000  12.00  0.001  0.01 
> 3512 1400.0000  14.00  0.001  0.01 
> 3513 1427.0000  12.00  0.001  0.01 
> 3514 1410.0000   7.00      0     0 
> 3515 1520.0000   8.00  0.001  0.01 
> 3516 1470.0000   7.60      0     0 
> 3517 1170.0000  10.00  0.001  0.01 
> 3518 4570.0000  20.00  0.001  0.45 
> 3519 8560.0000   0.50   0.14  0.31 
> 3520  708.0000  39.00  0.001  0.01 
> 3521  833.0000  40.00   0.01  0.01 
> 3522        NA     NA   <NA>  <NA> 
> 
> Computing the mean:
> 
> aggregate(chemicS$EC, by = list(east=chemicS$EST, north=chemicS$NORD),
> FUN = mean)
> 
> Count   east    north   Mean
> 350    89885   103160  318.50000
> 351    55870   103510  400.00000
> 352    82570   104845  637.33333
> 353    79119   107433         NA
> 354    79160   107462  362.77778
> 355    83010   108990         NA
> 356    82810   109010         NA
> 357    69135   112992         NA
> 358    55490   120140  142.25000
> 359    56580   120600         NA
> 360    56582   120607         NA
> 361    58050   125350         NA
> 362    58059   125360         NA
> 363    60360   128191         NA
> 364    65448   128293  252.50000
> 365  65472.5 128308.1         NA
> 366    61412   131141         NA
> 
> Computing the standard deviation:
> 
> aggregate(chemicS$EC, by = list(east=chemicS$EST, north=chemicS$NORD),
> FUN = sd, na.rm = TRUE)
> 
> Count  east    north     Stdev.
> 350    89885   103160    4.9497475
> 351    55870   103510           NA
> 352    82570   104845   19.6553640
> 353    79119   107433           NA
> 354    79160   107462   73.6745848
> 355    83010   108990           NA
> 356    82810   109010   15.6950098
> 357    69135   112992           NA
> 358    55490   120140    5.3150729
> 359    56580   120600           NA
> 360    56582   120607   22.4435801
> 361    58050   125350           NA
> 362    58059   125360   23.3108523
> 363    60360   128191   20.9789577
> 364    65448   128293   10.6066017
> 365  65472.5 128308.1           NA
> 366    61412   131141    8.6184556
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907




More information about the R-help mailing list