[R] more on paste and bug

Thomas Lumley tlumley at u.washington.edu
Wed Oct 10 21:45:18 CEST 2001


On 10 Oct 2001, Saikat DebRoy wrote:
> As it happens, I think the problem is in the read.dta code. The relevant
> piece of code is in foreign/src/stataread.c (lines 317-324):
>
> 	    default:
> 	        charlen=INTEGER(types)[j]-STATA_STRINGOFFSET;
> 	        PROTECT(tmp=allocString(charlen+1));
> 		InStringBinary(fp,charlen,CHAR(tmp));
> 		CHAR(tmp)[charlen]=0;
> 		SET_STRING_ELT(VECTOR_ELT(df,j),i,tmp);
> 		UNPROTECT(1);
> 	      break;
>
> As it happens, in this case the string "A" is written in the file
> as two bytes (I do not not know why) with the second byte being '\0'.
> So the above code creates a CHARSXP of length 3 with last two bytes
> being '\0'.
>

It happens because Stata treats strings as a fixed-length type, padded on
the right with nulls.  I didn't realise that R would incorporate trailing
nulls into the string.

It's easily fixed by just reading into a buffer and using strlen before
allocString.

It might be a bug that the LENGTH() of a string can be longer than its
strlen, though

	-thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list