[R] more on paste and bug

Ott Toomet siim at obs.ee
Tue Oct 9 15:03:31 CEST 2001


Hi,

dput( ce0) gives a correct answer:
> dput( ce0)
c("1985", "9", "2", "2", "1", "A", "1", "", "NA", "5", "1999" )

The same does just print( ce0):
> print( ce0)
 [1] "1985" "9"    "2"    "2"    "1"    "A"    "1"    ""     "NA"   "5"
[11] "1999"

However, if I make a new similar vector ce0a:
> ce0a <- c( 1985,9,2,2,1,"A",1,"",NA,5,1999)

Then the paste works correctly:
> paste( ce0a, m, sep="", collapse="")
[1] "1985<1>9<2>2<3>2<4>1<5>A<6>1<7><8>NA<9>5<0>1999END"

I had M as
> m
 [1] "<1>" "<2>" "<3>" "<4>" "<5>" "<6>" "<7>" "<8>" "<9>" "<0>" "END"

So I have two apparently similar vectors which behave differently with
paste:
> paste( ce0a, m, sep="", collapse="")
[1] "1985<1>9<2>2<3>2<4>1<5>A<6>1<7><8>NA<9>5<0>1999END"
> paste( ce0, m, sep="", collapse="")
[1] "1985<1>9<2>2<3>2<4>1<5>A1<7>NA<9>5<0>1999END"
> ce0a
 [1] "1985" "9"    "2"    "2"    "1"    "A"    "1"    ""     "NA"   "5"
[11] "1999"
> ce0
 [1] "1985" "9"    "2"    "2"    "1"    "A"    "1"    ""     "NA"   "5"
[11] "1999"

I suggest there can be some hidden attributes somewhere in ce0 which I have
not noticed (there seem not to be factors), the problem seems to arise with
the non-numerical columns (ce0 is just part of one row of the big
dataframe).  Is it possible to figure it out, and possible change?  At least
attributes() do show nothing:
> attributes(ce0)
NULL
> attributes(ce0a)
NULL

The problem is actully that I cannot transform a stata7 dataset to ASCII, R
seems to be the only program here which is able to open it, but I have still
problems with saving.


On 9 Oct 2001, Peter Dalgaard BSA wrote:

> Ott Toomet <siim at obs.ee> writes:
>
> > Dear all,
> >
> > I have strange problems with paste.  Actually I suggest it is a bug (I
> > send a associated bug report for some days ago too).
>
> (And a whopping big one too...)
>
> > I have a vector ce0, a character vector m and I paste them together:
> >
> > > ce0
> >  [1] "1985" "9"    "2"    "2"    "1"    "A"    "1"    ""     "NA"   "5"
> > [11] "1999"
> >
> > > m
> >  [1] "<1>" "<2>" "<3>" "<4>" "<4>" "<6>" "<7>" "<8>" "<9>" "<0>" "END"
> >
> > > paste( ce0, m, sep="", collapse="")
> > [1] "1985<1>9<2>2<3>2<4>1<4>A1<7>NA<9>5<0>1999END"
> >
> >                             ^
> > I expected to get the components of m-vector lying cleanly ordered
> > between the components of ce0, but it isn't so.  Instead of "1 2 3 4 5
> > 6 7 8 9 0 END" you can see "1 2 3 4 4   7 9 0 END".  The main problem
> > rises from the "A1" in the row (marked with ^), where there is no
> > component from m inbetween the ce0 values at all!
>
> I don't think the problem is sitting in paste():
>
> > ce0<-c("1985","9","2","2","1","A","1","","NA","5","1999")
> > m <- c("<1>","<2>","<3>","<4>","<4>","<6>","<7>","<8>","<9>","<0>","END")
> > paste( ce0, m, sep="", collapse="")
> [1] "1985<1>9<2>2<3>2<4>1<4>A<6>1<7><8>NA<9>5<0>1999END"
>
>
> > However, this needs some background information.  ce0 is originally
> > from a large dataset (7000 obs x 1200 vars), read from a stata7-file
> > using read.dta:
> > > ce0 <- read.dta( "file.dta")[1,230:240]
> > So I am afraid ce0 contains some kind of extra iformation which I was
> > not able to figure about.
>
> What happens if you do a dput() on the vectors?
>
> It might be that reading that big file is causing an overflow
> somewhere, leading to a corrupted workspace -- i.e. that the bug is
> really in the foreign package.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list