[R] Why Numeric Values Become Factors in Data Frame

Rich Shepard rshepard at appl-ecosys.com
Tue Nov 29 20:18:30 CET 2011


   I have a data frame with 1 factor, one date, and 37 numeric values:
str(waterchem)
'data.frame':	3525 obs. of  39 variables:
   site      : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 ...
  $ sampdate  : Date, format: "2007-12-12" "2008-03-15" ...
  $ CO3       : num  1 1 6.7 1 1 1 1 1 1 1 ...
  $ HCO3      : num  231 228 118 246 157 208 338 285 260 240 ...
  $ Ca        : num  100 88.4 63.4 123 78.2 103 265 213 178 166 ...
  $ DO        : num  4.96 9.91 4.32 2.58 1.81 5.09 3.98 5.46 1.9 2.52 ...
  ...
  $ SC        : Factor w/ 841 levels "1.090","10.000",..: 635 638 363

   All the numeric categories are read in as numbers except for some of those
in column 'SC'. I have been looking in the source file for a couple of hours
trying to learn why values such as 1.090 and 10.000 are seen as characters
rather than numbers. I've not see the reason.

   The source file is 860K and looks like this:

site|sampdate|'Ag'|'Al'|'CO3'|'HCO3'|'Alk-Tot'|'As'|'Ba'|'Be'|'Bi'|'Ca'|'Cd'|'Cl'|'Co'|'Cr'|'Cu'|'DO'|'Fe'|'Hg'|'K'|'Mg'|'Mn'|'Mo'|'Na'|'NH4'|'NO3-NO2'|'Oil-grease'|'Pb'|'pH'|'Sb'|'SC'|'Se'|'SO4'|'Sr'|'TDS'|'Tl'|'V'|'Zn'
'D-1'|'2007-12-12'|0.000|0.106|1.000|231.000|231.000|0.011|0.000|0.002|0.000|100.000|0.000|1.430|0.000|0.006|0.024|4.960|4.110|NA|0.000|9.560|0.035|0.000|0.970|0.010|0.293|NA|0.025|7.800|0.001|630.000|0.001|65.800|0.000|320.000|0.001|0.000|11.400
'D-1'|'2008-03-15'|0.000|0.080|1.000|228.000|228.000|0.001|0.000|0.002|0.000|88.400|0.000|1.340|0.000|0.006|0.014|9.910|0.309|0.000|0.000|9.150|0.047|0.000|0.820|0.224|0.020|NA|0.025|7.940|0.001|633.000|0.001|75.400|0.000|300.000|0.001|0.000|12.400

   The R command used to create the data frame is:
         waterchem <- read.table('wqR.txt', header = TRUE, sep = '|')

   Pointers on how to determine why this one variable has some values and
characters rather than as numerics are needed.

Rich



More information about the R-help mailing list