[R] Odd characters at beginning of file

Duncan Murdoch murdoch.duncan at gmail.com
Sat Apr 14 15:37:41 CEST 2012


On 12-04-13 6:20 PM, Lee Hachadoorian wrote:
> I'm use RPostgreSQL to access data on a Postgres server. I would like to
> keep my SQL statements in external files, as they're easier to write and
> debug in pgAdmin, then I use readLines to bring them into R and feed to
> dbGetQuery.
>
> Here's the problem. When I create a SQL script with pgAdmin, then load it
> in R, the ensuing script fails when I feed it to dbGetQuery. When I inspect
> the string in R on Linux, it *looks* OK, but fails. When I inspect the
> string in R on Windows, each file begins with "". (It took me a while to
> figure this out since I usually run R on Linux and the characters weren't
> displaying there.)
>
> I haven't tested a large number of applications but it appears to be a
> pgAdmin problem. Characters appear in SQL scripts created by pgAdmin on
> both Linux and Windows, do not appear in scripts created in Notepad on
> Windows or gedit on Linux. On Windows I think I can clean the string because
>
> sql = readLines("SELECT 1.sql")
> sql
>
> [1] "SELECT 1;"
>
> But on Linux
>
> sql = readLines("SELECT 1.sql")
> sql
>
> [1] "SELECT 1;"
>
> So how do I remove something that isn't even there? And yet the query fails.
>
> So I would like to know if someone knows why this is happening and how to
> avoid it, or how to clean these characters when working in R on Linux where
> they aren't visible in the string. Also, when I open the pgAdmin-created
> files in Notepad or gedit, I don't see anything unusual. But they are still
> there, because if I edit a pgAdmin-created file in Notepad or gedit, then
> save and read into R with readLines, they are there.

That's a byte-order mark.  Unicode standards say it should be invisible 
when printed, so it's probably there in Linux, but you can't see it. 
Windows is reading the file as though it is Latin1 when really it's 
UTF-8, so you see junk characters.

I believe it would be represented in a string in R as \uFEFF, and you 
should be able to do

sql <- sub("\uFEFF", "", sql)

to get rid of it.

Duncan Murdoch



More information about the R-help mailing list