[Rd] type.convert (PR#13646)

p.dalgaard at biostat.ku.dk p.dalgaard at biostat.ku.dk
Fri Apr 10 23:55:25 CEST 2009


William Dunlap wrote:
> You may have to use
>   (unsigned int)(unsigned char)*s++
> instead of just
>   (unsigned int)*s++
> to avoid the sign extension.

Thanks again,

I probably won't be doing the change since I don't have a Windows build 
environment around, and I'm a bit superstitious about fixing bugs that I 
cannot see...

Let me just filter this information into the bug repository for now.

	-pd

> 
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com  
> 
>> -----Original Message-----
>> From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk] 
>> Sent: Friday, April 10, 2009 1:41 PM
>> To: William Dunlap
>> Cc: r-devel at r-project.org
>> Subject: Re: [Rd] type.convert (PR#13646)
>>
>> William Dunlap wrote:
>>> I can reproduce the difference that Stefan saw, depending
>>> on whether or not I start Rgui with the flags
>>>     --no-environ --no-Rconsole
>>> I think it boils down to the isBlankString() function.
>>> For the string "\247" it returns 1 when those flags are
>>> not present and 0 when they are.  isBlankString does use
>>> some locale-specific functions:
>>> Rboolean isBlankString(const char *s)
>>> {
>>> #ifdef SUPPORT_MBCS
>>>     if(mbcslocale) {
>>>         wchar_t wc; int used; mbstate_t mb_st;
>>>         mbs_init(&mb_st);
>>>         while( (used = Mbrtowc(&wc, s, MB_CUR_MAX, &mb_st)) ) {
>>>             if(!iswspace(wc)) return FALSE;
>>>             s += used;
>>>         }
>>>     } else
>>> #endif
>>>         while (*s)
>>>             if (!isspace((int)*s++)) return FALSE;
>>>     return TRUE;
>>> }
>>>
>>> I was using R 2.8.1, downloaded precompiled from CRAN, on Windows
>>> XP SP3. The outputs of sessionInfo() and Sys.getenv() are the same
>>> in both sessions.  'Process Explorer' shows that the 2 sessions
>>> have the same dll's opened.
>> Thanks for that analysis Bill!
>>
>> Stefan was in "German_Austria.1252" which I don't think is 
>> multibyte, so 
>> only the else-clause should be relevant, pointing the finger rather 
>> squarely at isspace(). Googling indicates that others have 
>> been caught 
>> out by signed/unsigned char issues there. Should this 
>> possibly rather read
>>
>> if (!isspace((unsigned int)*s++)) return FALSE;
>>
>> ??
>>
>>>> sessionInfo()
>>> R version 2.8.1 (2008-12-22) 
>>> i386-pc-mingw32 
>>>
>>> locale:
>>> LC_COLLATE=English_United 
>> States.1252;LC_CTYPE=English_United 
>> States.1252;LC_MONETARY=English_United 
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  
>> methods   base     
>>> I did the test with a dll compiled from
>>> #include <R.h>
>>> #include <R_ext/Utils.h>
>>>
>>> void test_isBlankString(char **s, int *res)
>>> {
>>>    *res = isBlankString(*s) ;
>>> }
>>>
>>> and called by .C("test_isBlankString","\247",-1L)
>>>
>>> I don't see the difference while running a version of 2.9.0(devel)
>>> compiled locally on 11 March 2009 (from svn rev 48116).
>>>
>>> Bill Dunlap
>>> TIBCO Software Inc - Spotfire Division
>>> wdunlap tibco.com  
>>>
>>>> -----Original Message-----
>>>> From: r-devel-bounces at r-project.org 
>>>> [mailto:r-devel-bounces at r-project.org] On Behalf Of Peter Dalgaard
>>>> Sent: Friday, April 10, 2009 2:03 AM
>>>> To: Raberger, Stefan
>>>> Cc: R-bugs at r-project.org; r-devel at stat.math.ethz.ch
>>>> Subject: Re: [Rd] type.convert (PR#13646)
>>>>
>>>> Raberger, Stefan wrote:
>>>>> Hi Peter,
>>>>>
>>>>> each of the four PCs actually has the same locale setting: 
>>>>>
>>>>>> Sys.setlocale("LC_CTYPE")
>>>>> [1] "German_Austria.1252"
>>>>>
>>>>> (all the other settings returned by invoking 
>>>> Sys.getlocale() are identical as well).
>>>>> Just to be sure (because it's displayed incorrectly in my 
>>>> browser on the bugtracking page): the character inside the 
>>>> type.convert function ought to be a "section"-sign (HTML Code 
>>>> &#167; or &sect; , in R "\247", and not a dot ".").
>>>>
>>>> I saw it correctly. It's "\302\247" in UTF8 locales, which is 
>>>> of course 
>>>> the reason I suspected locale settings, but I can't seem to 
>>>> trigger the 
>>>> NA behaviour.
>>>>
>>>> I'm at a loss here, but some ideas:
>>>>
>>>> In the cases where it returns NA, what type is it? (I.e. 
>>>> storage.mode(type.convert(....)))
>>>>
>>>> What do you get from
>>>>
>>>>  > charToRaw("§")
>>>> [1] c2 a7
>>>>
>>>> (a7, presumably, but better check).
>>>>
>>>> -p
>>>>
>>>>> -----Ursprüngliche Nachricht-----
>>>>> Von: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk] 
>>>>> Gesendet: Donnerstag, 09. April 2009 19:26
>>>>> An: Raberger, Stefan
>>>>> Cc: r-devel at stat.math.ethz.ch; R-bugs at r-project.org
>>>>> Betreff: Re: [Rd] type.convert (PR#13646)
>>>>>
>>>>> s.raberger at innovest.at wrote:
>>>>>> Full_Name: Stefan Raberger
>>>>>> Version: 2.8.1
>>>>>> OS: Windows XP
>>>>>> Submission from: (NULL) (213.185.163.242)
>>>>>>
>>>>>>
>>>>>> Hi there, 
>>>>>>
>>>>>> I recently noticed some strange behaviour of the command 
>>>> "type.convert",
>>>>>> depending on the startup mode used. But there also seems 
>>>> to be different
>>>>>> behaviour on different PCs (all running the same OS and 
>>>> the same version of R).
>>>>>> On PC1:
>>>>>> When I start R in SDI mode (RGui --no-save --no-restore 
>>>> --no-site-file
>>>>>> --no-init-file --no-environ) and try to convert, the result is
>>>>>>
>>>>>>> type.convert("§")
>>>>>> [1] NA
>>>>>>
>>>>>> If I use MDI mode (RGui --no-save --no-restore 
>>>> --no-site-file --no-init-file
>>>>>> --no-environ --no-Rconsole) instead, the result is
>>>>>>
>>>>>>> type.convert("§")
>>>>>> [1] §
>>>>>> Levels: §
>>>>>>
>>>>>> On PC2 it's exactly the other way round (SDI: §, MDI: NA), 
>>>> on PC2 the result is
>>>>>> always NA, independent of the startup mode used, and on 
>>>> PC4 it's always §.
>>>>>> What's the result I should expect R to return, and why is 
>>>> it different in so
>>>>>> many cases?
>>>>> Which locale does R think it is in in the four cases? 
>>>>> (Sys.setlocale("LC_CTYPE"), I think).
>>>>>
>>>>> Might well not be a bug (so please don't file it as one).
>>>>>
>>>>>> Any help is much appreciated!
>>>>>> Regards, Stefan
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>> -- 
>>>>     O__  ---- Peter Dalgaard             Øster 
>> Farimagsgade 5, Entr.B
>>>>    c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>>>>   (*) \(*) -- University of Copenhagen   Denmark      Ph:  
>>>> (+45) 35327918
>>>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: 
>>>> (+45) 35327907
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>
>> -- 
>>     O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>>    c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>>   (*) \(*) -- University of Copenhagen   Denmark      Ph:  
>> (+45) 35327918
>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: 
>> (+45) 35327907
>>


-- 
    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907



More information about the R-devel mailing list