[Rd] strsplit and the empty string

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Wed Jun 18 23:22:23 CEST 2008


Oh my, I regret my stupidity.

Christian Brechbühler wrote:
>
> With R version 2.6.1 Patched (2007-11-26 r43541), I get
>     Error in strsplit(" hello dolly ") :
>       argument "split" is missing, with no default
>
> But strsplit(" hello dolly ", " ") reproduces your results.
>
>   
of course, that was my test code.

>
>
> The algorithm, the comment after it, and your results are consistent.
> Whether it is intuitive is a matter of taste.  I agree it's not as
> symmetric as one might like.
>
>   
the problem is that one needs to check for the empty string at the
beginning, if there could be one in the output.  perhaps an additional
logical argument to strsplit would be useful, e.g., 'reduce', 'sparse',
or whatever the name.

>> If the pattern matches, (second if above), the match is added to the
>> output, and removed from the input -- which after this step is the empty
>> string;
>>     
>
> Close.  The string to the left of the match, "dolly", is added to the output.
>   
that was what i should have written.  that was all too hasty.

> I agree, the input is now the empty string.
>
>   
>> in the next step, there is no match (else above), so the rest of
>> the input string (= the empty string) *should* be added, but it is not
>> what happens.
>>     
>
> No, in the next step, the string is empty (first 'if' above), and we break.
> The else branch never applies in your example.
>
>   

there's where my stupidity becomes apparent, i'm afraid ,(

>> (i see no good
>> reason for including the empty string at the beginning but not at the
>> end of the output; no other language i know would do that this way)
>>     
>
> I checked Perl, and it does exactly the same:
>   print join "==", split / /, " hello dolly "
> ==hello==dolly
> (that's 3 elements: "", "hello",  and "dolly").
>   

indeed, i must have mistyped while checking it.  so the results in a few
other languages are:

3 elements: java (and groovy), oz
4 elements: python, javascript, php, boo
2 elements: ruby,

while they are not mutually coherent, the last two approaches are at
least symmetric, which i find much more intuitive.

thanks for the reponse.  a good lesson ;)

vQ



More information about the R-devel mailing list