[R] Calculating the mean in one column with empty cells

PIKAL Petr petr.pikal at precheza.cz
Mon Oct 8 17:18:01 CEST 2012


Hi

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of William Dunlap
> Sent: Saturday, October 06, 2012 6:17 PM
> To: David Winsemius; fxen3k
> Cc: r-help at r-project.org
> Subject: Re: [R] Calculating the mean in one column with empty cells
> 
> For nine numbers, R-helpers should recommend that people show their
> data with dput(obj) instead of str(obj).
> dput() shows everything in the object to full precision.  str() shows a
> summary of the object and rounds numbers to 2 digits -- it is good for

actually 4 digits

 str(testdata)
 num [1:9] 0.2006 0.1321 0.0564 0.0264 0.0201 ...

but you are right that dput does not hide anything.

Anyway exporting through csv from Excel like programs through csv can be rather problematic due to rounding habit of those programs. 

The best way to solve this problem will probably be to ask Microsoft help for assistance.

Regards
Petr

> an overview of the data, but when the question is "why did I get a mean
> of .066666 instead of .06547494 from my 9 numbers"
> str() is not useful.
> 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
> 
> 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org
> > [mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius
> > Sent: Saturday, October 06, 2012 9:08 AM
> > To: fxen3k
> > Cc: r-help at r-project.org
> > Subject: Re: [R] Calculating the mean in one column with empty cells
> >
> >
> > On Oct 6, 2012, at 1:11 AM, fxen3k wrote:
> >
> > > Hi,
> > >
> > > the first command was bringing the numbers into R directly:
> > > *> testdata <- c(0.2006160108532920, 0.1321167173880490,
> > > 0.0563941428921262, 0.0264198664609803, 0.0200581303857603,
> > > -0.2971754213679500, -0.2353086361784190, 0.0667195538296534,
> > > 0.1755852636926560)
> > >> mean(testdata)
> > > [1] 0.0161584*
> > >
> > > Here I tried to calculate the mean with the same numbers as given
> > > above, but taken from my dataset.
> > > *
> > >> str(dataSet2$ac_bhar_60d_4d_after_ann[1:9])
> > > num [1:9] 0.2 0.13 0.06 0.03 0.02 -0.3 -0.24 0.07 0.18
> > >> mean(dataSet2$ac_bhar_60d_4d_after_ann[1:9])
> > > [1] 0.01666667
> > > *
> >
> > This is something that has happened in data processing:
> >
> > > dat <- read.csv2(text="0,2006160108532920
> > + 0,1321167173880490
> > + 0,0563941428921262
> > + 0,0264198664609803
> > + 0,0200581303857603
> > + -0,2971754213679500
> > + -0,2353086361784190
> > + 0,0667195538296534
> > + 0,1755852636926560
> > + ", header=FALSE)
> > > mean(dat[[1]])
> > [1] 0.0161584
> >
> > >
> >
> > > It seems that in the second case he calculates the mean with
> rounded
> > > numbers
> > > (0.2 and not 0.20061601085...)
> > > Could it be that R imports only the rounded numbers?
> > > How can I build a CSV-file with numbers showing all decimal places?
> > > Because I think my current CSV-file only has numbers with 2 decimal
> places.
> > >
> >
> > That is more likely the fault of Excel than it is something R is
> responsible for.
> >
> > --
> >
> > David Winsemius, MD
> > Alameda, CA, USA
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list