[Rd] Should last default to .Machine$integer.max-1 for substring()

Michael Chirico m|ch@e|ch|r|co4 @end|ng |rom gm@||@com
Mon Jun 21 00:20:26 CEST 2021


Currently, substring defaults to last=1000000L, which strongly
suggests the intent is to default to "nchar(x)" without having to
compute/allocate that up front.

Unfortunately, this default makes no sense for "very large" strings
which may exceed 1000000L in "width".

The max width of a string is .Machine$integer.max-1:

# works
x = strrep(" ", .Machine$integer.max-1L)
# fails
x = strrep(" ", .Machine$integer.max)
Error in strrep(" ", .Machine$integer.max) :
  'Calloc' could not allocate memory (18446744071562067968 of 1 bytes)

(see also the comment in src/main/character.c: "Character strings in R
are less than 2^31-1 bytes, so we use int not size_t.")

So it seems to me either .Machine$integer.max or
.Machine$integer.max-1L would be a more sensible default. Am I missing
something?

Mike C



More information about the R-devel mailing list