[R] bug(?) in str() with strict.width = "cut" when appliedtodataframe with numeric component AND factor or character component withlongerlevels/strings

Gerrit Eichner Gerrit.Eichner at math.uni-giessen.de
Wed Oct 16 10:59:22 CEST 2013


Dear Duncan,

unfortunately, I have to correct myself in that I _can_ reproduce the 
problem after changing the global width-option to 70, say: Using the data 
frame X from before with the 'factory-fresh' setting for width and 
executing


> str( X, strict.width = "cut")
'data.frame':   11 obs. of  2 variables:
  $ A: num  1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+05 ...
  $ B: Factor w/ 1 level "zjtvorkmoydsepnxkabmeondrjaanutjmfxlgzmrbjp": 1 1 1 1..


produces the correct output. But


> oo <- options( width = 70)
> str( X, strict.width = "cut")
'data.frame':   11 obs. of  2 variables:
  $ A: num  1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+..
  $ A: num  1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+..


is obviously the wrong output I reported previously. Restoring the old 
options "solves" the problem:


> options( oo)
> str( X, strict.width = "cut")
'data.frame':   11 obs. of  2 variables:
  $ A: num  1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+05 ...
  $ B: Factor w/ 1 level "zjtvorkmoydsepnxkabmeondrjaanutjmfxlgzmrbjp": 1 1 1 1..


Is that reproducible for you?

   Regards  --  Gerrit


PS: "New" session info:

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] fortunes_1.5-0

loaded via a namespace (and not attached):
[1] tools_3.0.2





On Wed, 16 Oct 2013, Gerrit Eichner wrote:

> Thanks, Duncan,
>
> for the good (indirect) hint: after a restart of R the problem is -- 
> fortunately :-) -- not reproducible anymore for me either. The R session had 
> been running for a longer time and I recall doing some (system-related) 
> things outside of R that may have interfered with it; I just forgot to take 
> that possibility into consideration. :(
>
> Regards  --  Gerrit
>
> On Tue, 15 Oct 2013, Duncan Murdoch wrote:
>
>> On 15/10/2013 7:53 AM, Gerrit Eichner wrote:
>>> Dear list subscribers,
>>> 
>>> here is a small artificial example to demonstrate the problem that I
>>> encountered when looking at the structure of a (larger) data frame that
>>> comprised (among other components)
>>> 
>>> a numeric component of elements of the order of > 10000, and
>>> 
>>> a factor or character component with longer levels/strings:
>>> 
>>> 
>>> k <- 43      # length of levels or character strings
>>> n <- 11      # number of rows of data frame
>>> M <- 10000   # order of magnitude of numerical values
>>> 
>>> set.seed( 47) # to reproduce the following artificial character string
>>> longer.char.string <- paste( sample( letters, k, replace = TRUE),
>>>                                collapse = "")
>>> 
>>> X <- data.frame( A = 1:n * M,
>>>                    B = rep( longer.char.string, n))
>>> 
>>> 
>>> The following call to str() gives apparently a wrong result
>>> 
>>> str( X, strict.width = "cut")
>>> 
>>> 'data.frame':   11 obs. of  2 variables:
>>>    $ A: num  1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+..
>>>    $ A: num  1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+..
>>> 
>>> 
>>> whereas the correct result appears for str( X) or if you decrease k to 42
>>> (isn't that "the answer"? ;-) ) or n to 10 or M to 1000 (or smaller,
>>> respectively).
>>> 
>>> 
>>> I tried to dig into the entrails of str.default(), where the cause may
>>> lie, but got lost pretty soon. So, I am hoping that someone may already
>>> have a work-around or patch (or dares to dig further)? Thank you for any
>>> feedback!
>> 
>> I can't reproduce this.  I don't have a 64 bit copy of 3.0.2 handy, but I 
>> don't see it in 64 bit 3.0.1, or 64 bit 3.0.2-patched, or various 32 bit 
>> versions.
>> 
>> Is it reproducible for you?  It looks to me as though (if it isn't just 
>> something weird on your system, e.g. an old copy of str() in your 
>> workspace), it might be a memory protection problem:  something needed to 
>> be duplicated but wasn't.  But unless I can see it happen, I can't start to 
>> fix it.
>> 
>> Duncan Murdoch
>> 
>>>
>>>    Best regards  --  Gerrit
>>> 
>>> PS:
>>> 
>>> > sessionInfo()
>>> 
>>> R version 3.0.2 (2013-09-25)
>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>> 
>>> locale:
>>> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252
>>> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
>>> [5] LC_TIME=German_Germany.1252
>>> 
>>> attached base packages:
>>> [1] splines   stats     graphics  grDevices utils     datasets
>>> [7] methods   base
>>> 
>>> other attached packages:
>>> [1] nparcomp_2.0     multcomp_1.2-21  mvtnorm_0.9-9996
>>> [4] car_2.0-19       Hmisc_3.12-2     Formula_1.1-1
>>> [7] survival_2.37-4  fortunes_1.5-0
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] cluster_1.14.4  grid_3.0.2      lattice_0.20-23 MASS_7.3-29
>>> [5] nnet_7.3-7      rpart_4.1-3     stats4_3.0.2    tools_3.0.2
>>> 
>>> ---------------------------------------------------------------------
>>> Dr. Gerrit Eichner                   Mathematical Institute, Room 212
>>> gerrit.eichner at math.uni-giessen.de   Justus-Liebig-University Giessen
>>> Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
>>> Fax: +49-(0)641-99-32109        http://www.uni-giessen.de/cms/eichner
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list