[R] Importing Large Dataset from Excel

Patrick Connolly p_connolly at slingshot.co.nz
Sun Dec 16 10:02:29 CET 2007


On Wed, 12-Dec-2007 at 11:35AM +0100, Peter Dalgaard wrote:

|> Philippe Grosjean wrote:
|> > The problem is often a misspecification of the comment.char argument. 
|> > For read.table(), it defaults to '#'. This means that everywhere you 
|> > have a '#' char in your Excel sheet, the rest of the line is ignored. 
|> > This results in a different number of items per line.
|> >
|> > You should better use read.csv() which provides better default arguments 
|> > for your particular problem.
|> > Best,
|> >
|> >   
|> Or read.delim/read.delim2, which should be even better at TAB-separated
|> files.
|> 
|> In general, be very suspicious of read.table() with such files, not only
|> because of the '#' but also because it expects columns separated by
|> _arbitrary_ amounts of whitespace. I.e., n TABs  counts as one, so empty
|> fields are skipped over.

I don't recall that happening with TABs, but a problem can arise when
the last (rightmost) column has more than a few empty cells.
Occasionally, I've had to resort to adding a dummy column on the
right, but as Peter suggests, read.delim is usually less involved.


-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___    Patrick Connolly   
 {~._.~}          		 Great minds discuss ideas    
 _( Y )_  	  	        Middle minds discuss events 
(:_~*~_:) 	       		 Small minds discuss people  
 (_)-(_)  	                           ..... Anon
	  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.



More information about the R-help mailing list