[Rd] Error in substring: invalid multibyte string

Toby Hocking tdhock5 @end|ng |rom gm@||@com
Sat Jun 27 00:57:06 CEST 2020


Hi all,
I'm getting the following error from substring:

> substr("<I>Jens Oehlschl\xe4gel-Akiyoshi", 1, 100)
Error in substr("<I>Jens Oehlschl\xe4gel-Akiyoshi", 1, 100) :
  invalid multibyte string at '<e4>gel-A<6b>iyoshi'

Is that normal / intended? I've tried setting the Encoding/locale to
Latin-1/UTF-8 but that does not help. nchar gives me something similar

> nchar("<I>Jens Oehlschl\xe4gel-Akiyoshi")
Error in nchar("<I>Jens Oehlschl\xe4gel-Akiyoshi") :
  invalid multibyte string, element 1

I find it strange that substr/nchar give an error but regexpr works for
telling me the length:

> regexpr(".*", "<I>Jens Oehlschl\xe4gel-Akiyoshi")
[1] 1
attr(,"match.length")
[1] 29

Is that inconsistency normal/intended?

btw this example comes from our very own list:

> readLines("
https://stat.ethz.ch/pipermail/r-devel/1999-November/author.html")[28]
[1] "<I>Jens Oehlschl\xe4gel-Akiyoshi"

Best,
Toby

	[[alternative HTML version deleted]]



More information about the R-devel mailing list