[R] read.csv, header=TRUE but skip the first line

Sun Jun 28 23:38:53 CEST 2009

On 28-Jun-09 21:05:59, Mark Knecht wrote:
> Hi,
> Complete newbie to R here. Just getting started reading manuals and
> playing with data.
> 
> I've managed to successfully get my *.csv files into R, however I
> have to use header=FALSE because the real header starts in line #2.
> The file format looks like:
> 
> PORTFOLIO EQUITY TABLE
> 
> TRADE,MARK-SYS,DATE/TIME,PL/SIZE,PS METHOD,POS SIZE,POS
> PL,DRAWDOWN,DRAWDOWN(%),EQUITY
> 
> 1,1,1/8/2004 12:57:00 PM,124.00,As Given,1,124.00,0.00,0,"10,124.00"
> 2,1,1/14/2004 9:03:00 AM,-86.00,As
> Given,1,-86.00,86.00,0.849,"10,038.00"
> 3,1,1/14/2004 11:51:00 AM,-226.00,As
> Given,1,-226.00,312.00,3.082,"9,812.00"
> 4,1,1/15/2004 12:57:00 PM,134.00,As
> Given,1,134.00,178.00,1.758,"9,946.00"
> 
> where the words "PORTFOLIO EQUITY TABLE" make up line 1, the rest of
> the text is on line 2, and then the lines starting with numbers are
> the real data. (Spaces added by me for email clarity only.)
> 
> If I remove the first line by hand then I can use header=TRUE and
> things work correctly, but it's not practical for me to remove the
> first line by hand on all these files every day.
> 
> I'd like to understand how I can do the read.csv but skip the first
> line. Possibly read the file, delete the first line and then send it
> to read.csv, or some other way?
> 
> Thanks in advance,
> Mark

Simply use the option "skip=1", as opposed to the default "skip=0".
This then skips the first line of the file and only starts reading
at line 2. With "header=TRUE" (which is the default for read.csv()
anyway), the first line read in (i.e. line 2 of the file) will be
taken as the header, and the remainder as data.

You should read what it output by

  ?read.csv

One thing that may be tricky for a beginner to get their head round
is that this is *really* the help page for read.table(), and
that read.csv() is in fact a "front end" for read.table() with
different defaults.

In particular, whereas read.table() has default "header=FALSE",
read.csv() has default "header=TRUE". Also, of course, where
read.table() has sep="" (i.e. white space), read.csv() has sep=",".

Other options for read.csv() which are not mentioned specifically
in the "usage" line for read.csv() (i.e. are subsumed in "...")
are the same as options mentioned in the "usage" line for read.table()
and have the same defaults.

So, implicitly, "skip" is an option for read.csv() just as it is
for read.table(), and it has the same default, namely "skip=0".
So you can set it to "skip=1" just as you can for read.table()
and it will work in the same way.

This is stated in "?read.csv" to be:
    skip: integer: the number of lines of the data file to skip before
          beginning to read data.

This is potentially misleading because of the final word "data",
since a beginner might think this referred to the real data part
of the file (i.e. what follows the header), when "header=TRUE" as
in read.csv().

More explicitly, it could be written:

    skip: integer: the number of lines of the data file to skip before
          beginning to read the lines in the file.

Hoping this helps!
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 28-Jun-09                                       Time: 22:37:46
------------------------------ XFMail ------------------------------