[R] read.table() question

Marc Schwartz marc_schwartz at me.com
Wed Oct 19 23:22:14 CEST 2016


> On Oct 19, 2016, at 4:12 PM, David Winsemius <dwinsemius at comcast.net> wrote:
> 
> 
>> On Oct 19, 2016, at 1:54 PM, Rich Shepard <rshepard at appl-ecosys.com> wrote:
>> 
>> The file, daily_records.dat, contains these data:
>> 
>> "station","date","amount"
>> "0.3E",2014-01-01,
>> "0.3E",2014-01-02,
>> "0.3E",2014-01-03,0.01
>> "0.3E",2014-01-04,0.00
>> "0.3E",2014-01-05,0.00
>> "0.3E",2014-01-06,0.00
>> "0.3E",2014-01-07,0.10
>> "0.3E",2014-01-08,0.22
>> "0.3E",2014-01-09,0.49
>> 
>> Using read.table("daily_records.dat", header = TRUE, sep = ",", quote =
>> "\"\"") the data are assigned to a data.frame named 'rain.'
>> 
>> I expect the structure to show station and date as factors with amount as
>> numeric, but they're all factors:
> 
> I got both station and amounts as numeric:
> 
> dat <- read.table(text='"station","date","amount"
> "0.3E",2014-01-01,
> "0.3E",2014-01-02,
> "0.3E",2014-01-03,0.01
> "0.3E",2014-01-04,0.00
> "0.3E",2014-01-05,0.00
> "0.3E",2014-01-06,0.00
> "0.3E",2014-01-07,0.10
> "0.3E",2014-01-08,0.22
> "0.3E",2014-01-09,0.49', header = TRUE, sep = ",",quote =
> "\"\"")
> str(dat)
> 'data.frame':	9 obs. of  3 variables:
> $ station: num  0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
> $ date   : Factor w/ 9 levels "2014-01-01","2014-01-02",..: 1 2 3 4 5 6 7 8 9
> $ amount : num  NA NA 0.01 0 0 0 0.1 0.22 0.49
> 
> 
> Why aren't you using colClasses?


'station' comes over as numeric because the 'E' is presumed to be for scientific notation in the limited data copied here. 

It appears that the actual data file has a 'W' suffix, presumably for a directional designation (East versus West), as seen below in Rich's str() output.

> str(type.convert("0.3E"))
 num 0.3

> str(type.convert("0.3W"))
 Factor w/ 1 level "0.3W": 1

> str(type.convert(c("0.3E", "0.3W")))
 Factor w/ 2 levels "0.3E","0.3W": 1 2


As David and Duncan experienced, 'amount' came over as numeric for me as well, again with the limited data here. So as Duncan noted, there is likely a value somewhere in that column that results in the coercion to factor when ?type.convert is applied to the column, because the value is not a proper number.

Regards,

Marc Schwartz


> 
> 
>> 
>> str(rain)
>> 'data.frame':	341 obs. of  3 variables:
>> $ station: Factor w/ 6 levels "0.3E","0.6W",..: 1 1 ...
>> $ date   : Factor w/ 62 levels "2013-12-01","2013-12-02",..: 32 33 34 ...
>> $ amount : Factor w/ 48 levels "","0.00","0.01",..: 1 1 3 2 ...
>> 
>> Why is amount taken as a factor rather than numeric? I do not recall
>> having numbers read as factors before this.
>> 
>> I expect to need to convert dates using as.Date() but not to convert
>> numbers.
>> 
>> TIA,
>> 
>> Rich
> 
> ______________



More information about the R-help mailing list