[Rd] type.convert (PR#13646)

wdunlap at tibco.com wdunlap at tibco.com
Sat Apr 11 01:00:23 CEST 2009


Using the (unsigned int)(unsigned char) in isspace()
resolved the problem in my Windows build.  I put some Rprintf
statements into isBlankString and for type.convert("\247")
it printed
  *s=3D-89 (4294967207 if unsigned)
    8=3Disspace(*s)
    8=3Disspace((unsigned int)*s)
    0=3Disspace((unsigned int)(unsigned char)*s)
I think the 8 is the value of a random bit of memory.

When I converted S+ to use full 8-bit characters I ran
into the same problem.  The is<class> macros in <ctype.h>
all take unsigned int argument and if char was signed you had
to do the double cast to avoid sign extension.  Whoever
designed the interface either didn't worry about 8-bit characters
or had chars that were unsigned by default.

It doesn't look like any of the isspace calls in R do
this double casting.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com =20

> -----Original Message-----
> From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]=20
> Sent: Friday, April 10, 2009 2:50 PM
> To: William Dunlap
> Cc: R-bugs at r-project.org; Raberger, Stefan
> Subject: Re: [Rd] type.convert (PR#13646)
>=20
> William Dunlap wrote:
> > You may have to use
> >   (unsigned int)(unsigned char)*s++
> > instead of just
> >   (unsigned int)*s++
> > to avoid the sign extension.
>=20
> Thanks again,
>=20
> I probably won't be doing the change since I don't have a=20
> Windows build=20
> environment around, and I'm a bit superstitious about fixing=20
> bugs that I=20
> cannot see...
>=20
> Let me just filter this information into the bug repository for now.
>=20
> 	-pd
>=20
> >=20
> > Bill Dunlap
> > TIBCO Software Inc - Spotfire Division
> > wdunlap tibco.com =20
> >=20
> >> -----Original Message-----
> >> From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]=20
> >> Sent: Friday, April 10, 2009 1:41 PM
> >> To: William Dunlap
> >> Cc: r-devel at r-project.org
> >> Subject: Re: [Rd] type.convert (PR#13646)
> >>
> >> William Dunlap wrote:
> >>> I can reproduce the difference that Stefan saw, depending
> >>> on whether or not I start Rgui with the flags
> >>>     --no-environ --no-Rconsole
> >>> I think it boils down to the isBlankString() function.
> >>> For the string "\247" it returns 1 when those flags are
> >>> not present and 0 when they are.  isBlankString does use
> >>> some locale-specific functions:
> >>> Rboolean isBlankString(const char *s)
> >>> {
> >>> #ifdef SUPPORT_MBCS
> >>>     if(mbcslocale) {
> >>>         wchar_t wc; int used; mbstate_t mb_st;
> >>>         mbs_init(&mb_st);
> >>>         while( (used =3D Mbrtowc(&wc, s, MB_CUR_MAX, &mb_st)) ) {
> >>>             if(!iswspace(wc)) return FALSE;
> >>>             s +=3D used;
> >>>         }
> >>>     } else
> >>> #endif
> >>>         while (*s)
> >>>             if (!isspace((int)*s++)) return FALSE;
> >>>     return TRUE;
> >>> }
> >>>
> >>> I was using R 2.8.1, downloaded precompiled from CRAN, on Windows
> >>> XP SP3. The outputs of sessionInfo() and Sys.getenv() are the same
> >>> in both sessions.  'Process Explorer' shows that the 2 sessions
> >>> have the same dll's opened.
> >> Thanks for that analysis Bill!
> >>
> >> Stefan was in "German_Austria.1252" which I don't think is=20
> >> multibyte, so=20
> >> only the else-clause should be relevant, pointing the=20
> finger rather=20
> >> squarely at isspace(). Googling indicates that others have=20
> >> been caught=20
> >> out by signed/unsigned char issues there. Should this=20
> >> possibly rather read
> >>
> >> if (!isspace((unsigned int)*s++)) return FALSE;
> >>
> >> ??
> >>
> >>>> sessionInfo()
> >>> R version 2.8.1 (2008-12-22)=20
> >>> i386-pc-mingw32=20
> >>>
> >>> locale:
> >>> LC_COLLATE=3DEnglish_United=20
> >> States.1252;LC_CTYPE=3DEnglish_United=20
> >> States.1252;LC_MONETARY=3DEnglish_United=20
> >> States.1252;LC_NUMERIC=3DC;LC_TIME=3DEnglish_United States.1252
> >>> attached base packages:
> >>> [1] stats     graphics  grDevices utils     datasets =20
> >> methods   base    =20
> >>> I did the test with a dll compiled from
> >>> #include <R.h>
> >>> #include <R_ext/Utils.h>
> >>>
> >>> void test_isBlankString(char **s, int *res)
> >>> {
> >>>    *res =3D isBlankString(*s) ;
> >>> }
> >>>
> >>> and called by .C("test_isBlankString","\247",-1L)
> >>>
> >>> I don't see the difference while running a version of 2.9.0(devel)
> >>> compiled locally on 11 March 2009 (from svn rev 48116).
> >>>
> >>> Bill Dunlap
> >>> TIBCO Software Inc - Spotfire Division
> >>> wdunlap tibco.com =20
> >>>
> >>>> -----Original Message-----
> >>>> From: r-devel-bounces at r-project.org=20
> >>>> [mailto:r-devel-bounces at r-project.org] On Behalf Of=20
> Peter Dalgaard
> >>>> Sent: Friday, April 10, 2009 2:03 AM
> >>>> To: Raberger, Stefan
> >>>> Cc: R-bugs at r-project.org; r-devel at stat.math.ethz.ch
> >>>> Subject: Re: [Rd] type.convert (PR#13646)
> >>>>
> >>>> Raberger, Stefan wrote:
> >>>>> Hi Peter,
> >>>>>
> >>>>> each of the four PCs actually has the same locale setting:=20
> >>>>>
> >>>>>> Sys.setlocale("LC_CTYPE")
> >>>>> [1] "German_Austria.1252"
> >>>>>
> >>>>> (all the other settings returned by invoking=20
> >>>> Sys.getlocale() are identical as well).
> >>>>> Just to be sure (because it's displayed incorrectly in my=20
> >>>> browser on the bugtracking page): the character inside the=20
> >>>> type.convert function ought to be a "section"-sign (HTML Code=20
> >>>> &#167; or &sect; , in R "\247", and not a dot ".").
> >>>>
> >>>> I saw it correctly. It's "\302\247" in UTF8 locales, which is=20
> >>>> of course=20
> >>>> the reason I suspected locale settings, but I can't seem to=20
> >>>> trigger the=20
> >>>> NA behaviour.
> >>>>
> >>>> I'm at a loss here, but some ideas:
> >>>>
> >>>> In the cases where it returns NA, what type is it? (I.e.=20
> >>>> storage.mode(type.convert(....)))
> >>>>
> >>>> What do you get from
> >>>>
> >>>>  > charToRaw("=A7")
> >>>> [1] c2 a7
> >>>>
> >>>> (a7, presumably, but better check).
> >>>>
> >>>> -p
> >>>>
> >>>>> -----Urspr=FCngliche Nachricht-----
> >>>>> Von: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]=20
> >>>>> Gesendet: Donnerstag, 09. April 2009 19:26
> >>>>> An: Raberger, Stefan
> >>>>> Cc: r-devel at stat.math.ethz.ch; R-bugs at r-project.org
> >>>>> Betreff: Re: [Rd] type.convert (PR#13646)
> >>>>>
> >>>>> s.raberger at innovest.at wrote:
> >>>>>> Full_Name: Stefan Raberger
> >>>>>> Version: 2.8.1
> >>>>>> OS: Windows XP
> >>>>>> Submission from: (NULL) (213.185.163.242)
> >>>>>>
> >>>>>>
> >>>>>> Hi there,=20
> >>>>>>
> >>>>>> I recently noticed some strange behaviour of the command=20
> >>>> "type.convert",
> >>>>>> depending on the startup mode used. But there also seems=20
> >>>> to be different
> >>>>>> behaviour on different PCs (all running the same OS and=20
> >>>> the same version of R).
> >>>>>> On PC1:
> >>>>>> When I start R in SDI mode (RGui --no-save --no-restore=20
> >>>> --no-site-file
> >>>>>> --no-init-file --no-environ) and try to convert, the result is
> >>>>>>
> >>>>>>> type.convert("=A7")
> >>>>>> [1] NA
> >>>>>>
> >>>>>> If I use MDI mode (RGui --no-save --no-restore=20
> >>>> --no-site-file --no-init-file
> >>>>>> --no-environ --no-Rconsole) instead, the result is
> >>>>>>
> >>>>>>> type.convert("=A7")
> >>>>>> [1] =A7
> >>>>>> Levels: =A7
> >>>>>>
> >>>>>> On PC2 it's exactly the other way round (SDI: =A7, MDI: NA),=20
> >>>> on PC2 the result is
> >>>>>> always NA, independent of the startup mode used, and on=20
> >>>> PC4 it's always =A7.
> >>>>>> What's the result I should expect R to return, and why is=20
> >>>> it different in so
> >>>>>> many cases?
> >>>>> Which locale does R think it is in in the four cases?=20
> >>>>> (Sys.setlocale("LC_CTYPE"), I think).
> >>>>>
> >>>>> Might well not be a bug (so please don't file it as one).
> >>>>>
> >>>>>> Any help is much appreciated!
> >>>>>> Regards, Stefan
> >>>>>>
> >>>>>> ______________________________________________
> >>>>>> R-devel at r-project.org mailing list
> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>> --=20
> >>>>     O__  ---- Peter Dalgaard             =D8ster=20
> >> Farimagsgade 5, Entr.B
> >>>>    c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
> >>>>   (*) \(*) -- University of Copenhagen   Denmark      Ph: =20
> >>>> (+45) 35327918
> >>>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX:=20
> >>>> (+45) 35327907
> >>>>
> >>>> ______________________________________________
> >>>> R-devel at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>>
> >>
> >> --=20
> >>     O__  ---- Peter Dalgaard             =D8ster=20
> Farimagsgade 5, Entr.B
> >>    c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
> >>   (*) \(*) -- University of Copenhagen   Denmark      Ph: =20
> >> (+45) 35327918
> >> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX:=20
> >> (+45) 35327907
> >>
>=20
>=20
> --=20
>     O__  ---- Peter Dalgaard             =D8ster Farimagsgade 5, =
Entr.B
>    c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>   (*) \(*) -- University of Copenhagen   Denmark      Ph: =20
> (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX:=20
> (+45) 35327907
>=20



More information about the R-devel mailing list