[R] iconv() replaces invalid characters with " " instead of " " (two spaces instead of one) on unix?

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat Mar 14 16:06:30 CET 2015


On 14/03/2015 11:07, Anthony Damico wrote:
> hello, i am trying to replace non-ASCII characters in a character string
> with a single space.  the iconv() function works as i expect it to on
> windows, but on unix, non-ASCII characters are getting replaced with two
> spaces instead of one.  i suppose i could write a workaround for my code,
> but i'm wondering if i'm making some other mistake?

You are (not reading the help, not writing legible English) ...

      sub: character string.  If not ‘NA’ it is used to replace any
           non-convertible bytes in the input.

Note *bytes* not characters.  In UTF-8 'ó' is two bytes, other non-ASCII 
characters can be 2, 3, 4 (in the current Unicode standard, originally 
in principle up to 6).

We do not know what locale you used on Windows, but in non-CJK locales 
characters == bytes.

I guess chartr() will do what you want using a character range.

>
> in the output below, this is the result i'm getting:
> [1] "cancelaci  n"
>
> and this is the result i want:
> [1] "cancelaci n"
>
> thanks!!
>
> =================
>
>> getOption( "encoding" )
> [1] "windows-1252"

What is the relevance of that?

>
>> a <- "cancelación"
>> iconv(a,"","ASCII")
> [1] NA
>> iconv(a,"","ASCII",sub=" ")
> [1] "cancelaci  n"
>
> =================
>
>> sessionInfo()
> R version 3.1.2 (2014-10-31)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] R.utils_1.34.0    R.oo_1.18.0       R.methodsS3_1.6.1 descr_1.0.4
>   [5] SAScii_1.0        downloader_0.3    foreign_0.8-61    MonetDB.R_0.9.5
>   [9] digest_0.6.6      DBI_0.3.1
>
> loaded via a namespace (and not attached):
> [1] xtable_1.7-4
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK



More information about the R-help mailing list