[R] Why Numeric Values Become Factors in Data Frame

Rich Shepard rshepard at appl-ecosys.com
Tue Nov 29 22:48:55 CET 2011


On Tue, 29 Nov 2011, Rich Shepard wrote:

>  Pointers on how to determine why this one variable has some values and
> characters rather than as numerics are needed.

Joshua, Marc, David, Bill, Sarah, Bert, et al.:

   Thank you all for the insights and ideas. It was a valuable lesson and it
helped me fix the problem.

   Somehow my client had URLs in two data cells of the original Excel
spreadsheet. I removed that in my LibreOffice copy and exported the file as
a .csv. But, I was using a prior version with the cruft still in there when
I read it into R.

   Now that I corrected the problem (and fixed mis-entered conductivity
values < 100) the R data frame is correct:

str(waterchem)
'data.frame':	3524 obs. of  39 variables:
  $ site      : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 1 ...
  $ sampdate  : Date, format: "2007-12-12" "2008-03-15" ...
  $ Ag        : num  0 0 0 0 0 0 0 0 0 0 ...
  $ Al        : num  0.106 0.08 0.116 0.08 0.08 0.08 0.08 0.08 0.08 0.08 ...
  $ CO3       : num  1 1 6.7 1 1 1 1 1 1 1 ...
  ...
  $ SC        : num  630 633 386 503 83.2 538 1450 1130 1040 940 ...

   I knew there was a non-number in there but didn't see it. Your guidance
not only taught me how to find it, but made me aware that while I was
searching in the cleaned up text file R was fed the old version.

Very much appreciated,

Rich



More information about the R-help mailing list