[Rd] [R] split strings

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Thu May 28 16:05:28 CEST 2009


Wacek Kusnierczyk wrote:
> William Dunlap wrote:
>   
>> Would your patched code affect the following
>> use of regexpr's output as input to substr, to
>> pull out the matched text from the string?
>>    > x<-c("ooo","good food","bad")
>>    > r<-regexpr("o+", x)
>>    > substring(x,r,attr(r,"match.length")+r-1)
>>    [1] "ooo" "oo"  ""   
>>   
>>     
>
> no; same output
>
>   
>>    > substr(x,r,attr(r,"match.length")+r-1)
>>    [1] "ooo" "oo"  ""   
>>   
>>     
>
> no; same output
>
>   
>>    > r
>>    [1]  1  2 -1
>>    attr(,"match.length")
>>    [1]  3  2 -1
>>    > attr(r,"match.length")+r-1
>>    [1]  3  3 -3
>>    attr(,"match.length")
>>    [1]  3  2 -1
>>   
>>     
>
> for the positive indices there is no change, as you might expect.
>
> if i understand your concern, the issue is that regexpr returns -1 (with
> the corresponding attribute -1) where there is no match.  in this case,
> you expect "" as the substring. 
>
> if there is no match, we have:
>
>     start = r = -1 (the start you index provide)
>     stop = attr(r) + r - 1 = -1 + -1 -1 = -3 (the stop index you provide)
>
> for a string of length n, my patch computes the final indices as follows:
>
>     start' = n + start - 1
>     stop' = n + stop - 1
>
> whatever the value of n, stop' - start' = stop - start = -3 - 1 = -4. 
>   

except for that stop - start = -3 - -1 = -2, but that's still negative,
i.e., stop' < start'.
silly me, sorry.

vQ

> that is, stop' < start', hence an empty string is returned, by virtue of
> the original code.  (see the sources for details.)
>
> does this answer your question?
>
>



More information about the R-devel mailing list