[R] text file imported incorrectly

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Sep 4 09:43:02 CEST 2008


Please do read the help page (as you were asked to do before posting). 
See the 'quote' argument.

This is also covered in the 'R Data Import/Export Manual'.

On Thu, 4 Sep 2008, Weiyang Lim wrote:

> Dear R-users,
>
> When I tried to import a text file (tab delimited) which has 2000+ rows with the following command (With the importData in S, it works though),
>
> x <- read.table(textfile, sep= "\t", skip=5, stringAsFactors=F)
>
> I received the following warning message: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,: number of items read is not a multiple of the number of columns. I checked the resulting data frame and found only about 1500 observations rather than 2000+ observations.
>
> Then, I used the command count.fields(textfile, sep="\t") and it showed that I have rows which have either 4 fields or 294 fields. (There are 294 variables altogether) When I tried to check those observations/rows which have only 4 fields indicated by count.fields, I realized that the problem is quite likely due to one of the variables I have. For this company variable,
>
> The "problematic" rows have names such as:
> BANK INT'L INDONESIA
> BEIJING CAP INT'L AIRP H
> BELLE INT'L HLDGS(CN)
>
> The other non-problematic rows have names like
>
> ANZ BANKING GROUP
> BABCOCK & BROWN
> BEC WORLD
>
> which did not give problems.
>
> I believe the ' symbol is causing this variable for some of these rows to be read incorrectly. How do I read this field such that the names
>
> BANK INT'L INDONESIA
> BEIJING CAP INT'L AIRP H
> BELLE INT'L HLDGS(CN) etc
>
> can be interpreted as a single field and that all my rows will have 294 fields correctly interpreted by R. What will be the correct command to issue?
>
> Hope I am not unclear in my explanation of my problem.
>
> Hope to have your kind assistance!
>
> Best Regards,
> wy

> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list