[Rd] encoding argument of source() in 3.5.0

NELSON, Michael mnel@ @ending from doh@he@lth@n@w@gov@@u
Mon Jun 4 14:41:06 CEST 2018



On R 3.5.0 (Mac) 

The issue appears when using the default (libcurl) method and specifying the encoding

Note that using method='internal' causes a segfault if used in conjunction with encoding. (and works when encoding is not set)

urlR <- "http://home.versanet.de/~s-berman/source2.R"
# works 
url_default <- url(urlR)
scan(url_default, "")
# Read 7 items
# [1] "source.test2"       "<-"                 "function()"         "{"                  "print(\"Non-ascii:" "äöüß\")"           
# [7] "}"                 

url_default_en <- url(urlR, encoding = "UTF-8")
scan(url_default_en, "")
# Read 0 items
# character(0)
url_internal <- url(urlR, method = 'internal')
scan(url_internal, "")
# Read 7 items
# [1] "source.test2"       "<-"                 "function()"         "{"                  "print(\"Non-ascii:" "äöüß\")"           
# [7] "}"                 

url_internal_en <- url(urlR, encoding = "UTF-8", method = 'internal')
#scan(url_internal_en, "")
#*** caught segfault ***
#  address 0x0, cause 'memory not mapped'

url_libcurl <- url(urlR, method = 'libcurl')
scan(url_libcurl, "")
# Read 7 items
# [1] "source.test2"       "<-"                 "function()"         "{"                  "print(\"Non-ascii:" "äöüß\")"           
# [7] "}" 
url_libcurl_en <- url(urlR, encoding = "UTF-8", method = 'libcurl')
scan(url_libcurl_en, "")
# Read 0 items
# character(0)


Michael

________________________________________
From: R-devel [r-devel-bounces using r-project.org] on behalf of Stephen Berman [stephen.berman using gmx.net]
Sent: Monday, 4 June 2018 7:26 PM
To: Martin Maechler
Cc: R-devel
Subject: Re: [Rd] encoding argument of source() in 3.5.0

On Mon, 4 Jun 2018 10:44:11 +0200 Martin Maechler <maechler using stat.math.ethz.ch> wrote:

>>>>>> peter dalgaard
>>>>>>     on Sun, 3 Jun 2018 23:51:24 +0200 writes:
>
>     > Looks like this actually comes from readLines(), nothing
>     > to do with source() as such: In current R-devel (still):
>
>     >> f <- file("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8")
>     >> readLines(f)
>     > character(0)
>     >> close(f)
>     >> f <- file("http://home.versanet.de/~s-berman/source2.R")
>     >> readLines(f)
>     > [1] "source.test2 <- function() {"   "    print(\"Non-ascii: äöüß\")"
>     > [3] "}"
>
>     > -pd
>
> and that's not even readLines(), but rather how exactly the
> connection is defined [even in your example above]
>
>   > urlR <- "http://home.versanet.de/~s-berman/source2.R"
>   > readLines(urlR, encoding="UTF-8")
>   [1] "source.test2 <- function() {"   "    print(\"Non-ascii: äöüß\")"
>   [3] "}"
>   > f <- file(urlR, encoding = "UTF-8")
>   > readLines(f)
>   character(0)
>
> and the same behavior with scan()  instead of readLines() :
>
>> scan(urlR,"") # works
> Read 7 items
> [1] "source.test2"       "<-"                 "function()"         "{"

> [5] "print(\"Non-ascii:" "äöüß\")"            "}"
>> scan(f,"") # fails
> Read 0 items
> character(0)
>>
>
> So it seems as if the bug is in the file() [or url()] C code ..

Yes, the problem seems to be restricted to loading files from a
(non-local) URL; i.e. this works fine on my computer:

  > source("file:///home/steve/prog/R/source2.R", encoding="UTF-8")

Also, I noticed this works too:

  > read.table("http://home.versanet.de/~s-berman/table2", encoding="UTF-8", skip=1)

where (if I read the source correctly) using `skip=1' makes read.table()
call readLines().  (The read.table() invocation also works without
`skip'.)

> But then we also have to consider Windows .. where I think most changes have
> happened during the  R-3.4.4 --> R-3.5.0  transition.

Yes, please.  I need (or at least it would be convenient) to be able to
load R code containing non-ascii characters from the web under
MS-Windows.

Steve Berman

______________________________________________
R-devel using r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
__________________________________________________________________________________________________________
This email has been scanned for the NSW Ministry of Health by the Websense Hosted Email Security System.
Emails and attachments are monitored to ensure compliance with the NSW Ministry of health's Electronic Messaging Policy.
__________________________________________________________________________________________________________

_______________________________________________________________________________________________________
Disclaimer: This message is intended for the addressee named and may contain confidential information.
If you are not the intended recipient, please delete it and notify the sender.
Views expressed in this message are those of the individual sender, and are not necessarily the views of the NSW Ministry of Health.
_______________________________________________________________________________________________________
This email has been scanned for the NSW Ministry of Health by the Websense Hosted Email Security System.
Emails and attachments are monitored to ensure compliance with the NSW Ministry of Health's Electronic Messaging Policy.



More information about the R-devel mailing list