[R] data import: strange experience

David Carlson dcarlson at tamu.edu
Wed Aug 21 19:07:15 CEST 2013


You should be able to figure it out if you just print out the
four factor levels that read.table() missed. The main
differences are that read.table() includes ' in the quote=
argument and it recognizes # as a comment (and therefore
discards it and everything after it):

setdiff(levels(dfcsv$Var), levels(dftxt$Var))

The base function is read.table() and it includes the following
defaults:

quote="\"'", comment.char="#"

Functions read.csv() and read.delim() call read.table() but
change those defaults to

quote="\"", comment.char=""

David

From: SH [mailto:emptican at gmail.com] 
Sent: Wednesday, August 21, 2013 10:14 AM
To: dcarlson at tamu.edu; peter dalgaard
Cc: r-help
Subject: Re: [R] data import: strange experience

Thanks Peter.  It works with read.delim.
 
David: Thanks for your comments.  To answer your questions.  I
don't have 'NA' and all balanced.  The number of mssing levels
were 4 and it happened only to those four levels.  Yes, there
is commas embedded and some characters (e.g., '-', space, some
wired characters in the middle of names, etc.).  I can send you
sample data if you are willing to take a look.  Even though
using 'read.delim' works, I am still curious what caused the
problem and potential problem that I may miss.
 
Thanks again,
 
SH
 
 

On Wed, Aug 21, 2013 at 10:58 AM, David Carlson
<dcarlson at tamu.edu> wrote:
This is not really enough information to diagnose the problem.
What are the missing factor levels? Were the missing levels
combined with another level or do you have missing values (NA)
for those observations? Do the extra factor levels include
embedded commas? There are differences between read.table and
read.csv in the default quote= and comment.char= arguments.

-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of SH
Sent: Wednesday, August 21, 2013 9:36 AM
To: r-help at r-project.org
Subject: [R] data import: strange experience

Dear List:

I had some strange experience in importing data.  I wonder if
anyone of you
had the same problem before and would greatly appreciate your
suggestion in
advance.

The original data set in excel format.

Here is a brief summary of the procedure I did:
1. I saved the original excel data as csv and txt formats,
separately.
2. I imported two data using the following codes.  There were no
error
messages.
dftxt = read.table('df.txt',header=T, sep='\t')
dfcsv = read.csv('df.csv',header=T, sep=',')
3. When I checked data with 'str', I found that factor levels of
a variable
were different each other.
Levels of dftxt were less than those of dfcsv (48 vs 52).
4. So, I checked 'df.txt' file and found that the missing levels
were still
there, i.e., there is a no problem in text file.  I suspect that
something
happened when I imported it into R.

Since there was no errors in importing the file into R, I do not
have an
idea where to start to fix it.  Do you have any suggestion?

Thank you very much in advance,

SH
        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.



More information about the R-help mailing list