[R] Why Numeric Values Become Factors in Data Frame

David Winsemius dwinsemius at comcast.net
Tue Nov 29 20:37:53 CET 2011


On Nov 29, 2011, at 2:18 PM, Rich Shepard wrote:

>  I have a data frame with 1 factor, one date, and 37 numeric values:
> str(waterchem)
> 'data.frame':	3525 obs. of  39 variables:
>  site      : Factor w/ 64 levels "D-1","D-2","D-3",..: 1 1 1 1 1 ...
> $ sampdate  : Date, format: "2007-12-12" "2008-03-15" ...
> $ CO3       : num  1 1 6.7 1 1 1 1 1 1 1 ...
> $ HCO3      : num  231 228 118 246 157 208 338 285 260 240 ...
> $ Ca        : num  100 88.4 63.4 123 78.2 103 265 213 178 166 ...
> $ DO        : num  4.96 9.91 4.32 2.58 1.81 5.09 3.98 5.46 1.9  
> 2.52 ...
> ...
> $ SC        : Factor w/ 841 levels "1.090","10.000",..: 635 638 363
>
>  All the numeric categories are read in as numbers except for some  
> of those
> in column 'SC'. I have been looking in the source file for a couple  
> of hours
> trying to learn why values such as 1.090 and 10.000 are seen as  
> characters
> rather than numbers. I've not see the reason.
>
>  The source file is 860K and looks like this:
>
> site|sampdate|'Ag'|'Al'|'CO3'|'HCO3'|'Alk- 
> Tot 
> '| 
> 'As 
> '| 
> 'Ba 
> '| 
> 'Be 
> '| 
> 'Bi 
> '| 
> 'Ca 
> '| 
> 'Cd 
> '| 
> 'Cl 
> '|'Co'|'Cr'|'Cu'|'DO'|'Fe'|'Hg'|'K'|'Mg'|'Mn'|'Mo'|'Na'|'NH4'|'NO3- 
> NO2'|'Oil- 
> grease'|'Pb'|'pH'|'Sb'|'SC'|'Se'|'SO4'|'Sr'|'TDS'|'Tl'|'V'|'Zn'
> 'D-1'|'2007-12-12'|0.000|0.106|1.000|231.000|231.000|0.011|0.000| 
> 0.002|0.000|100.000|0.000|1.430|0.000|0.006|0.024|4.960|4.110|NA| 
> 0.000|9.560|0.035|0.000|0.970|0.010|0.293|NA|0.025|7.800|0.001| 
> 630.000|0.001|65.800|0.000|320.000|0.001|0.000|11.400
> 'D-1'|'2008-03-15'|0.000|0.080|1.000|228.000|228.000|0.001|0.000| 
> 0.002|0.000|88.400|0.000|1.340|0.000|0.006|0.014|9.910|0.309|0.000| 
> 0.000|9.150|0.047|0.000|0.820|0.224|0.020|NA|0.025|7.940|0.001| 
> 633.000|0.001|75.400|0.000|300.000|0.001|0.000|12.400
>
>  The R command used to create the data frame is:
>        waterchem <- read.table('wqR.txt', header = TRUE, sep = '|')
>
>  Pointers on how to determine why this one variable has some values  
> and
> characters rather than as numerics are needed.

So what does this show?

grep("[^0-9.]", waterchem$SC)



David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list