[R] Weird read.xls behavior

Gabor Grothendieck ggrothendieck at gmail.com
Tue May 10 17:39:42 CEST 2011


On Tue, May 10, 2011 at 12:12 AM, Jun Shen <jun.shen.ut at gmail.com> wrote:
> Kenneth,
>
> Thanks for the reply. I checked the original data. There is no space. I even
> manually added a space to one value. After reading in with read.xls, the
> value has two spaces. The reason I don't like it is I am going to do some
> comparison with another dataset, which is supposed to be the same as this
> one. Now I am getting a bunch of false negatives.

It seems that the perl program underlying gdata's read.xls puts out
lines like this:

|"KAI-4169-002","830","5 mg" |
where | mark the beginning and end and are not part of the line.
read.csv includes the space after the last double quote in the last
field even though its outside of the double quote.

As an interim fix, edit the file at this location:

   system.file("perl", "xls2csv.pl", package = "gdata")

removing the space before the \n in this line:
   print OutFile "$outputLine \n"
so it becomes this:
   print OutFile "$outputLine\n"

Now it should work.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list