[Rd] do_gsub (PR#10540)

Andrew Runnalls A.R.Runnalls at kent.ac.uk
Fri Jan 25 13:11:11 CET 2008


Hin-Tak,

Apologies for the delay in responding.

On 2008/01/11 Hin-Tak Leung wrote:
> I believe STRING_ELT() works even when pat is R_NilValue, so
> the first line "does the right thing" even if the first element does not exist. I think this is intentional and not a bug.
> 
> Other more knowledgeable people please correct me if I am wrong.
> 
> A.R.Runnalls at kent.ac.uk wrote:
>> In character.c within R 2.6.1, the function do_gsub contains the following code:
>>
>> 1199        if (STRING_ELT(pat, 0) == NA_STRING) {
>> 1200            PROTECT(ans = allocVector(STRSXP, n));
>> 1201            for(i = 0; i < n; i++)  SET_STRING_ELT(ans, i, NA_STRING);
>> 1202            UNPROTECT(1);
>> 1203            return ans;
>> 1204        }
>> 1205
>> 1206        if (length(pat) < 1 || length(rep) < 1) error(R_MSG_IA);
>>
>> Line 1206 checks that pat contains at least one element. However,
>> line 1199 has already attempted to access the first element. The check
>> should surely come first. 

Generally speaking, in the interests of software reliability and
maintainability, it is a good idea for part of a program to 'do the
right thing' itself - in the present case, to check that an array is
non-empty before assuming that it is - rather than relying on what is in
fact rather unusual behaviour of another part of the program.  The case
I am concerned about is not in fact where 'pat' is R_NilValue, but when
it is a vector of length zero.  When asked to create such a vector,
memory.c will in fact allocate some element space in the vector, which
is why the above problem doesn't manifest itself as a crash (in the
no-segfault.R test, for example, which explores this case); however, it
seems unwise to rely on this.  And what would happen if that
(redundantly) allocated space just happened already to contain NA_STRING?

Best regards,
Andrew



More information about the R-devel mailing list