[R] Help on using read.table with files containing dates

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Jun 25 15:16:55 CEST 2003


On Wed, 25 Jun 2003, Chriss, Neil wrote:

> I am a relatively new user of R and have a question about using the
> read.table command for reading in .csv files that contain dates.  

Unfortuantely them appear to contain perversions of dates, not the dates 
recognised by the ISO standard.  I read your dates as the first of
Jan, Apr and May 2003, but some illogical people use mm/dd/yy order.

> My
> ultimate goal is to be able to read several different files into data frames
> and then do a merge along the "Date" column of each data frame.  It seems I
> should be able to specify the name of the column that contains dates and
> then automatically convert that to dates, all in a single read.table
> statement.  It's such a natural thing to do (and the help for read.table
> gives me hope, but I have not been able to figure out some of the options,
> e.g., colNames).  
>  
> Here is a specific example of what I mean and what the problems are.  An
>  
> First I load a sample data file whose first column is a date.  This gives
> the wrong answer as we shall see:
>  
> > library(date)
> > sampData <- read.table("sampleData.csv",sep=",",header=TRUE)
> > sampData
>       Date Col1 Col2   Col3
> 1 1/1/2003  1.2  1.4  0.160
> 2 1/4/2003  1.8  1.2  0.900
> 3 1/5/2003  0.9  1.1 -0.003
> > mode(sampData$Date)
> [1] "numeric"
> > 
> 
> Note that the Date column is coerced incorrectly into being numeric. 

It is not:  mode is inappropriate here: you actually have a factor
whose mode is numeric.  Use class(), not mode().

> Right
> now what I do is force R to read this in as a character using "as.is" and
> then convert it to a date as follows:
>  
> > sampData <- read.table("sampleData.csv",sep=",",header=TRUE,as.is=1)
> > sampData
>       Date Col1 Col2   Col3
> 1 1/1/2003  1.2  1.4  0.160
> 2 1/4/2003  1.8  1.2  0.900
> 3 1/5/2003  0.9  1.1 -0.003
> > mode(sampData$Date)
> [1] "character"
> > sampData$Date <- as.date(sampData$Date)
> > sampData
>    Date Col1 Col2   Col3
> 1 15706  1.2  1.4  0.160
> 2 15709  1.8  1.2  0.900
> 3 15710  0.9  1.1 -0.003
> > 
> 
> Now the Data column is converted to Julian, which is good b/c if I repeat
> this procedure with other dataframes I can do a merge on the date columns
> and line up the dates.  But, there are two drawbacks with this approach
> which I do not know how to solve:  
>  
> 1.  it's generally clumsy and takes too many lines.
>  
> 2.  (more importantly) it forces me to specify which column number contains
> the dates instead of simply stating which column header name contains date.
> I would rather simply the column header name of the column that contains the
> date (in this example: Date) so that whichever column contains the date will
> be automatically converted to dates.
>  
> So, is there a way to specify that I want the column labeled "Date" to be
> read as a date class (as in as.date)?

Yes, that's what the argument colClasses is for.  But since as.date is
not part of R (it is in contributed packages survival and date) you will 
need to provide an as() method.

I suggest you use

sampData[["Date"]] <- strptime(as.character(sampData[["Date"]]), "%d/%m/%y")

since the conversion to factor can be undone very easily.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list