[R] Reading a tab delimted file of varying length using read.table
btupper at bigelow.org
Sun Jan 17 22:46:36 CET 2016
Any software would be challenged to determine the boundaries between your columns.
ff <- 'http://data.princeton.edu/wws509/datasets/divorce.dat'
txt <- readLines(ff)
#  " id heduc heblack mixed years div " " 9 12-15 years No No 10.546 No "
#  " 11 < 12 years No No 34.943 No " " 13 < 12 years No No 2.834 Yes "
#  " 15 < 12 years No No 17.532 Yes " " 33 12-15 years No No 1.418 No
You don't have tab delimiters but instead have space delimiters (well sort of). Your second column has either one ("12-15 years") or two ("< 12 years") spaces embedded in the values. That will mess up any scheme using spaces to delineate the columns.
Perhaps you can read this as fixed width - see ?read.fwf - but you'll have to fiddle with the width specifications.
> On Jan 17, 2016, at 10:31 AM, Pradeep Bisht <pradeep.bisht0303 at gmail.com> wrote:
> Hello Experts ,
> Being a SAS developer I am finding it difficult to perform some of data
> cleaning in R that are quite easy to perform in SAS .
> I have been trying to read a .dat file and after a lot of attempts have
> failed to find a solution . Maybe R doesn't have the functionality right
> now or I am not looking in the right place . Here is my code .
> colClasses = c("numeric", "character", "character","character", "double",
> "character" ) )
> The error i get i
> this .
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
> scan() expected 'a real', got '912-15yearsNoNo10.546No'
> Also does read.table always calls scan in background to do its job . If so
> why use read.table in first place .
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
More information about the R-help