[R] Row limit for read.table

Wed Jan 17 17:40:10 CET 2007

Frank McCown schrieb:
> I have been trying to read in a large data set using read.table, but 
> I've only been able to grab the first 50,871 rows of the total 122,269 rows.
>
>  > f <- 
> read.table("http://www.cs.odu.edu/~fmccown/R/Tchange_rates_crawled.dat", 
> header=TRUE, nrows=123000, comment.char="", sep="\t")
>  > length(f$change_rate)
> [1] 50871
>
>  From searching the email archives, I believe this is due to size limits 
> of a data frame.  So...
>
>   
It is not due to size limits, see below.
> 1) Why doesn't read.table give a proper warning when it doesn't place 
> every read item into a data frame?
>
>   
In your case, read.table behaves as documented.
The ' - character is one of the standard quoting characters. Some (but 
very few) of the entrys contain single ' chars, so sometimes more than 
ten thousand lines are just treated as a single entry. Try using 
quote="" to disable quoting, as documented on the help page:

f<-read.table("http://www.cs.odu.edu/~fmccown/R/Tchange_rates_crawled.dat",
header=TRUE, nrows=123000, comment.char="", sep="\t",quote="")

length(f$change_rate)
[1] 122271

> 2) Why isn't there a parameter to read.table that allows the user to 
> specify which columns s/he is interested in?  This functionality would 
> allow extraneous columns to be ignored which would improve memory usage.
>
>   
There is (colClasses, works as documented). Try

 f<-read.table("http://www.cs.odu.edu/~fmccown/R/Tchange_rates_crawled.dat", 

+ header=TRUE, nrows=123000, comment.char="", 
sep="\t",quote="",colClasses=c("character","NULL","NULL","NULL","NULL"))
 > dim(f)
[1] 122271      1

> I've already made a work-around by loading the table into mysql and 
> doing a select on the 2 columns I need.  I just wonder why the above 2 
> points aren't implemented.  Maybe they are and I'm totally missing it.
>
>   
Did you read the help page?
> Thanks,
> Frank
>
>
>   
Regards,
   Martin