[Rd] type.convert (PR#13646)

Peter Dalgaard p.dalgaard at biostat.ku.dk
Fri Apr 10 22:40:54 CEST 2009


William Dunlap wrote:
> I can reproduce the difference that Stefan saw, depending
> on whether or not I start Rgui with the flags
>     --no-environ --no-Rconsole
> I think it boils down to the isBlankString() function.
> For the string "\247" it returns 1 when those flags are
> not present and 0 when they are.  isBlankString does use
> some locale-specific functions:
> Rboolean isBlankString(const char *s)
> {
> #ifdef SUPPORT_MBCS
>     if(mbcslocale) {
>         wchar_t wc; int used; mbstate_t mb_st;
>         mbs_init(&mb_st);
>         while( (used = Mbrtowc(&wc, s, MB_CUR_MAX, &mb_st)) ) {
>             if(!iswspace(wc)) return FALSE;
>             s += used;
>         }
>     } else
> #endif
>         while (*s)
>             if (!isspace((int)*s++)) return FALSE;
>     return TRUE;
> }
> 
> I was using R 2.8.1, downloaded precompiled from CRAN, on Windows
> XP SP3. The outputs of sessionInfo() and Sys.getenv() are the same
> in both sessions.  'Process Explorer' shows that the 2 sessions
> have the same dll's opened.

Thanks for that analysis Bill!

Stefan was in "German_Austria.1252" which I don't think is multibyte, so 
only the else-clause should be relevant, pointing the finger rather 
squarely at isspace(). Googling indicates that others have been caught 
out by signed/unsigned char issues there. Should this possibly rather read

if (!isspace((unsigned int)*s++)) return FALSE;

??

> 
>> sessionInfo()
> R version 2.8.1 (2008-12-22) 
> i386-pc-mingw32 
> 
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> 
> I did the test with a dll compiled from
> #include <R.h>
> #include <R_ext/Utils.h>
> 
> void test_isBlankString(char **s, int *res)
> {
>    *res = isBlankString(*s) ;
> }
> 
> and called by .C("test_isBlankString","\247",-1L)
> 
> I don't see the difference while running a version of 2.9.0(devel)
> compiled locally on 11 March 2009 (from svn rev 48116).
> 
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com  
> 
>> -----Original Message-----
>> From: r-devel-bounces at r-project.org 
>> [mailto:r-devel-bounces at r-project.org] On Behalf Of Peter Dalgaard
>> Sent: Friday, April 10, 2009 2:03 AM
>> To: Raberger, Stefan
>> Cc: R-bugs at r-project.org; r-devel at stat.math.ethz.ch
>> Subject: Re: [Rd] type.convert (PR#13646)
>>
>> Raberger, Stefan wrote:
>>> Hi Peter,
>>>
>>> each of the four PCs actually has the same locale setting: 
>>>
>>>> Sys.setlocale("LC_CTYPE")
>>> [1] "German_Austria.1252"
>>>
>>> (all the other settings returned by invoking 
>> Sys.getlocale() are identical as well).
>>> Just to be sure (because it's displayed incorrectly in my 
>> browser on the bugtracking page): the character inside the 
>> type.convert function ought to be a "section"-sign (HTML Code 
>> &#167; or &sect; , in R "\247", and not a dot ".").
>>
>> I saw it correctly. It's "\302\247" in UTF8 locales, which is 
>> of course 
>> the reason I suspected locale settings, but I can't seem to 
>> trigger the 
>> NA behaviour.
>>
>> I'm at a loss here, but some ideas:
>>
>> In the cases where it returns NA, what type is it? (I.e. 
>> storage.mode(type.convert(....)))
>>
>> What do you get from
>>
>>  > charToRaw("§")
>> [1] c2 a7
>>
>> (a7, presumably, but better check).
>>
>> -p
>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk] 
>>> Gesendet: Donnerstag, 09. April 2009 19:26
>>> An: Raberger, Stefan
>>> Cc: r-devel at stat.math.ethz.ch; R-bugs at r-project.org
>>> Betreff: Re: [Rd] type.convert (PR#13646)
>>>
>>> s.raberger at innovest.at wrote:
>>>> Full_Name: Stefan Raberger
>>>> Version: 2.8.1
>>>> OS: Windows XP
>>>> Submission from: (NULL) (213.185.163.242)
>>>>
>>>>
>>>> Hi there, 
>>>>
>>>> I recently noticed some strange behaviour of the command 
>> "type.convert",
>>>> depending on the startup mode used. But there also seems 
>> to be different
>>>> behaviour on different PCs (all running the same OS and 
>> the same version of R).
>>>> On PC1:
>>>> When I start R in SDI mode (RGui --no-save --no-restore 
>> --no-site-file
>>>> --no-init-file --no-environ) and try to convert, the result is
>>>>
>>>>> type.convert("§")
>>>> [1] NA
>>>>
>>>> If I use MDI mode (RGui --no-save --no-restore 
>> --no-site-file --no-init-file
>>>> --no-environ --no-Rconsole) instead, the result is
>>>>
>>>>> type.convert("§")
>>>> [1] §
>>>> Levels: §
>>>>
>>>> On PC2 it's exactly the other way round (SDI: §, MDI: NA), 
>> on PC2 the result is
>>>> always NA, independent of the startup mode used, and on 
>> PC4 it's always §.
>>>> What's the result I should expect R to return, and why is 
>> it different in so
>>>> many cases?
>>> Which locale does R think it is in in the four cases? 
>>> (Sys.setlocale("LC_CTYPE"), I think).
>>>
>>> Might well not be a bug (so please don't file it as one).
>>>
>>>> Any help is much appreciated!
>>>> Regards, Stefan
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> -- 
>>     O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>>    c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>>   (*) \(*) -- University of Copenhagen   Denmark      Ph:  
>> (+45) 35327918
>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: 
>> (+45) 35327907
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>


-- 
    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907



More information about the R-devel mailing list