[R] importing files, columns "invade next column"

Wed Jan 19 20:55:35 CET 2005

On Wed, 2005-01-19 at 19:28 +0000, Tiago R Magalhaes wrote:
> Thanks very much Mark and Prof Ripley
> 
> a) using sep='\t' when using read.table() helps somewhat
> 
> there is still a problem:I cannot get all the lines:
> df <- read.table('file.txt', fill=T, header=T, sep='\t')
> dim(df)
>   9543  195
> 
> while with the shorter file (11 cols) I get all the rows
> dim(df)
>   15797    11
> 
> I have looked at row 9544 where the file seems to stop reading, but I 
> cannot see in any of the cols an obvious reason for this to happen. 
> Any ideas why? Maybe there is one col that is stopping the reading 
> process and that column is not one of the 11 that are present in the 
> smaller file.
> 
> b) fill=T is necessary
> without fill=T, I get an error:
> "line 1892 did not have 195 elements"

Tiago,

How was this data file generated? Is it a raw file created by some other
application or was it an ASCII export, perhaps from a spreadsheet or
database program?

It seems that there is something inconsistent in the large data file,
which is either by design or perhaps the result of being corrupted by a
poor export.

It may be helpful to know how the file was generated in the effort to
assist you.

> c) help page for read.table
> I reread the help file for read.table and I would suggest to change 
> it. From what I think I am reading, the '\t' would not be needed to 
> work in my file, but it actually is:from the help page:
> 
>   If 'sep = ""' (the default for 'read.table') the separator is "white 
> space", that is one or more spaces, tabs or newlines.

Under normal circumstances, this should not be a problem, but given the
unknowns about your file, it leaves an open question as to the etiology
of the incorrect import.

Marc