[R] importing csv gets me all 16,000 columns with "NA"

(Ted Harding) ted.harding at wlandres.net
Fri Oct 22 12:14:21 CEST 2010


On 21-Oct-10 23:23:03, mkinseth wrote:
> I'm new to R. I have a mac (OS10.6). I have converted an Excel
> file to a csv to import into R. I have used many methods to
> import the file, most do not work, the best so far is:
> 
> filename <- read.csv(/Users/Desktop/csvfile.csv",
>                      header=T, sep=",").
> I have also tried taking out the header and sep lines and it
> still imports fine. 
> 
> Regardless of what I do, it always brings in the data but also
> includes all 16,000+ columns from Excel with X.1 as the first
> column up to X.16345 (for instance). Of course the files are
> filled with "NA" since no data is present.
> 
> How do I get around this?
> 
> So far I have included strip.white=T and fill=T, something I've
> seen in other csv import posts but this does not work.
> 
> Thanks!

"strip.white=T" is not going to solve the problem, since this
only changes the values of character fields which are present
(by removing leading and trailing blanks). It will not remove
blank fields.

Nor will "fill=T": if there is, anywhere in the CSV file, a
row with more commas in it than the others, then *all* the
rows will be extended so as to have the same number of fields
as the excessively long one.

See the result of entering
  ?read.csv
for more detail.

Since, you say, you have converted the Excel file to CSV
yourself, the solution is probably in your own hands.

Excel is notorious for messing up data structures in ways
which are not readily perceptible to the user. One way in
which this can happen is that if, during data entry, the
cursor wanders beyond the intended limits of the data set
(e.g. to a row below the number of rows in the data, or to
a column to the right of the number of columns in the data)
then, without the user entering anything whatever into the
cell where the cursor happens to be, Excel may register
that cell as having content. The user is unlikely to be
aware that this has happened, but once it has happened
the CSV file will contain a rectanguler block which includes
this "outlying" cell, hence too many rows and/or too many
columns.

One way to try to overcome this, in Excel, is to select the
area of the table to be exported prior to initiating the
export to CSV. I won't advise on techniques for that, since
I haven't used an Excel version later than Win98 ... !
(and I have never happy that I have done the right thing in
any Excel operation). I hope others will give detailed advice.

Then you should find that the CSV contains only as many rows
and columns as you have in your data and you should be OK.

It may simply be that, by default, your Excel has saved the
whole spreadsheet, so you get all the columns. The same
solution should apply.

Hoping that this help[s.
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 22-Oct-10                                       Time: 11:14:19
------------------------------ XFMail ------------------------------



More information about the R-help mailing list