[R] read.csv : double quoted numbers

Aval Sarri aval.sarri at gmail.com
Fri Aug 22 10:03:36 CEST 2008


On Thu, Aug 21, 2008 at 12:54 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:

> The problem is the commas, not the quotes:

First thanks to everyone for taking out time to help me.

As Mr. Gabor rightly corrected commas was my problem and not the
double quotes (in numerical values).  This solves my problem. But I
have couple more questions.

1) Is it recommended to set default locale to C when working with R?

I am asking this since, the file that I am reading is xls (but
containing text only). The file command on Linux terminal returns
"ASCII English text, with very long lines, with CRLF line
terminators."

gsub fails on this file due to a particular value  "C￴te d'Ivoire"
which is present in the file. This is when my locale is set to
en_US.utf8 but when I set locale to C it (gsub) works. So is it okay
to set default locale to C?

2) Other problem is escaping double quote in pipe and sed. I have
tried almost all possible combination that came to my limited mind.

> a = scan(pipe("sed -e s/\"//g WEOall.xls"), sep="\t")
> a = scan(pipe('sed -e s/\"//g WEOall.xls'),  sep="\t")
> a = scan(pipe("sed -r s/\\"//g WEOall.xls"), sep="\t")
> a = scan(pipe("sed -e s/\\\\"//g WEOall.xls"), sep="\t")
> a = scan(pipe('sed -e s/"//g WEOall.xls'), sep="\t")
> a = scan(pipe(paste('sed -e s/\"//g WEOall.xls')),sep="\t")
> a = scan(pipe(paste('sed -e s/\\\\"//g WEOall.xls')),sep="\t")

For all of the above I am getting "sh: Syntax error: Unterminated quoted string"

> sessionInfo()
R version 2.6.2 (2008-02-08)
i486-pc-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

Thanks and Regards
Aval S.







>> Lines.raw <- '"1,001.23""1,008,000.456"'
>> Lines <- readLines(textConnection(Lines.raw))
>> Lines <- gsub(",", "", Lines)
>> read.table(textConnection(Lines))
>       V1      V2
> 1 1001.23 1008000




>
>
>
> On Wed, Aug 20, 2008 at 2:27 PM, Aval Sarri <aval.sarri at gmail.com> wrote:
>> Hello;
>>
>> I am new user of R; so pardon me.
>>
>> I am reading a .txt file that has around 50+ numeric columns with '\t'
>> as separator. I am using read.csv function along with colClasses but
>> that fails to recognize double quoted numeric values. (My numeric
>> values are something like "1,001.23"; "1,008,000.456".)   Basically
>> read.csv fails with  - "scan() expected 'a real', got '"1,044.059"'.
>>
>> What I have tried and problems with them:
>>
>>
>> 1) I tried  scan and pipe but getting following error message; that is
>> how do I replace all double quotes with nothing. I tired enclosing sed
>> command in single quotes but that does not help.
>> (Though the sed command works from shell)
>>
>> scan(pipe("sed -e s/\"//g DataAll.txt"), sep="\t")
>> sh: Syntax error: Unterminated quoted string
>>
>> 2) On mailing list on solution I found was setAs() described here
>> http://www.nabble.com/Re%3A--R--read.table()-and-scientific-notation-p6734890.html
>>
>> 3) Other than using as.is=TRUE and then doing as.numeric for numeric
>> columns what is the solution?  But then how do I efficiently convert
>> 50+ columns to numeric using regular expression? That is all my
>> numeric columns name starts with 'X' character, so how do I use sapply
>> and/or regular expression to convert all columns starting with X to
>> numeric? What is the alternate method to do so?
>>
>> Basically 2 and 3 works but which one is efficient and correct way to do this.
>>
>> (Also what is most efficient way to apply field level validation and
>> conversion while reading a file? Does one has to read the file and
>> only after that validation and conversion can happen?)
>>
>> Thanks for taking out time to read through the mail.
>>
>> Thanks and Regards
>> -Aval
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>


More information about the R-help mailing list