[R] Reading a tab delimted file of varying length using read.table

Ben Tupper btupper at bigelow.org
Sun Jan 17 22:46:36 CET 2016


Hi Pradeep,

Any software would be challenged to determine the boundaries between your columns.

ff <- 'http://data.princeton.edu/wws509/datasets/divorce.dat'
txt <- readLines(ff)
head(txt)
# [1] "       id        heduc   heblack   mixed     years   div  " "       9   12-15 years        No      No    10.546    No  "
# [3] "      11    < 12 years        No      No    34.943    No  " "      13    < 12 years        No      No     2.834   Yes  "
# [5] "      15    < 12 years        No      No    17.532   Yes  " "      33   12-15 years        No      No     1.418    No  

You don't have tab delimiters but instead have space delimiters (well sort of).  Your second column has either one ("12-15 years") or two ("< 12 years") spaces embedded in the values.  That will mess up any scheme using spaces to delineate the columns.  

Perhaps you can read this as fixed width - see ?read.fwf - but you'll have to fiddle with the width specifications.

Cheers,
Ben


> On Jan 17, 2016, at 10:31 AM, Pradeep Bisht <pradeep.bisht0303 at gmail.com> wrote:
> 
> Hello Experts  ,
> 
> Being a SAS developer I am finding it difficult to perform some of data
> cleaning in R that are quite easy to perform in SAS .
> 
> I have been trying to read a .dat file and after a lot of attempts have
> failed to find a solution . Maybe R doesn't have the functionality right
> now or I am not looking in the right place . Here is my code .
> 
> f5=read.table("http://data.princeton.edu/wws509/datasets/divorce.dat
> <http://www.linkedin.com/redir/redirect?url=http%3A%2F%2Fdata%2Eprinceton%2Eedu%2Fwws509%2Fdatasets%2Fdivorce%2Edat&urlhash=GVbR&_t=tracking_anet>
> ",
> header=T,
> sep="\t",
> colClasses = c("numeric", "character", "character","character", "double",
> "character" ) )
> The error i get i
> ​s​
> this .
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
> :
> scan() expected 'a real', got '912-15yearsNoNo10.546No'
> 
> Also does read.table always calls scan in background to do its job . If so
> why use read.table in first place .
> 
> Pradeep​
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



Ben Tupper
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
http://www.bigelow.org



More information about the R-help mailing list