bug(?) in str() with strict.width = "cut" when applied to dataframe with numeric component AND factor or character component with longer levels/strings
Duncan Murdoch
murdoch.duncan at gmail.com
Wed Oct 16 11:24:53 CEST 2013
On 13-10-16 4:59 AM, Gerrit Eichner wrote:
> Dear Duncan,
>
> unfortunately, I have to correct myself in that I _can_ reproduce the
> problem after changing the global width-option to 70, say: Using the data
> frame X from before with the 'factory-fresh' setting for width and
> executing
>
>
>> str( X, strict.width = "cut")
> 'data.frame': 11 obs. of 2 variables:
> $ A: num 1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+05 ...
> $ B: Factor w/ 1 level "zjtvorkmoydsepnxkabmeondrjaanutjmfxlgzmrbjp": 1 1 1 1..
>
>
> produces the correct output. But
>
>
>> oo <- options( width = 70)
>> str( X, strict.width = "cut")
> 'data.frame': 11 obs. of 2 variables:
> $ A: num 1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+..
> $ A: num 1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+..
>
>
> is obviously the wrong output I reported previously. Restoring the old
> options "solves" the problem:
>
>
>> options( oo)
>> str( X, strict.width = "cut")
> 'data.frame': 11 obs. of 2 variables:
> $ A: num 1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+05 ...
> $ B: Factor w/ 1 level "zjtvorkmoydsepnxkabmeondrjaanutjmfxlgzmrbjp": 1 1 1 1..
>
>
> Is that reproducible for you?
Yes, got it now. I'll take a look.
Duncan Murdoch
>
> Regards -- Gerrit
>
>
> PS: "New" session info:
>
>> sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Germany.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] fortunes_1.5-0
>
> loaded via a namespace (and not attached):
> [1] tools_3.0.2
>
>
>
>
>
> On Wed, 16 Oct 2013, Gerrit Eichner wrote:
>
>> Thanks, Duncan,
>>
>> for the good (indirect) hint: after a restart of R the problem is --
>> fortunately :-) -- not reproducible anymore for me either. The R session had
>> been running for a longer time and I recall doing some (system-related)
>> things outside of R that may have interfered with it; I just forgot to take
>> that possibility into consideration. :(
>>
>> Regards -- Gerrit
>>
>> On Tue, 15 Oct 2013, Duncan Murdoch wrote:
>>
>>> On 15/10/2013 7:53 AM, Gerrit Eichner wrote:
>>>> Dear list subscribers,
>>>>
>>>> here is a small artificial example to demonstrate the problem that I
>>>> encountered when looking at the structure of a (larger) data frame that
>>>> comprised (among other components)
>>>>
>>>> a numeric component of elements of the order of > 10000, and
>>>>
>>>> a factor or character component with longer levels/strings:
>>>>
>>>>
>>>> k <- 43 # length of levels or character strings
>>>> n <- 11 # number of rows of data frame
>>>> M <- 10000 # order of magnitude of numerical values
>>>>
>>>> set.seed( 47) # to reproduce the following artificial character string
>>>> longer.char.string <- paste( sample( letters, k, replace = TRUE),
>>>> collapse = "")
>>>>
>>>> X <- data.frame( A = 1:n * M,
>>>> B = rep( longer.char.string, n))
>>>>
>>>>
>>>> The following call to str() gives apparently a wrong result
>>>>
>>>> str( X, strict.width = "cut")
>>>>
>>>> 'data.frame': 11 obs. of 2 variables:
>>>> $ A: num 1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+..
>>>> $ A: num 1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+..
>>>>
>>>>
>>>> whereas the correct result appears for str( X) or if you decrease k to 42
>>>> (isn't that "the answer"? ;-) ) or n to 10 or M to 1000 (or smaller,
>>>> respectively).
>>>>
>>>>
>>>> I tried to dig into the entrails of str.default(), where the cause may
>>>> lie, but got lost pretty soon. So, I am hoping that someone may already
>>>> have a work-around or patch (or dares to dig further)? Thank you for any
>>>> feedback!
>>>
>>> I can't reproduce this. I don't have a 64 bit copy of 3.0.2 handy, but I
>>> don't see it in 64 bit 3.0.1, or 64 bit 3.0.2-patched, or various 32 bit
>>> versions.
>>>
>>> Is it reproducible for you? It looks to me as though (if it isn't just
>>> something weird on your system, e.g. an old copy of str() in your
>>> workspace), it might be a memory protection problem: something needed to
>>> be duplicated but wasn't. But unless I can see it happen, I can't start to
>>> fix it.
>>>
>>> Duncan Murdoch
>>>
>>>>
>>>> Best regards -- Gerrit
>>>>
>>>> PS:
>>>>
>>>>> sessionInfo()
>>>>
>>>> R version 3.0.2 (2013-09-25)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
>>>> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
>>>> [5] LC_TIME=German_Germany.1252
>>>>
>>>> attached base packages:
>>>> [1] splines stats graphics grDevices utils datasets
>>>> [7] methods base
>>>>
>>>> other attached packages:
>>>> [1] nparcomp_2.0 multcomp_1.2-21 mvtnorm_0.9-9996
>>>> [4] car_2.0-19 Hmisc_3.12-2 Formula_1.1-1
>>>> [7] survival_2.37-4 fortunes_1.5-0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] cluster_1.14.4 grid_3.0.2 lattice_0.20-23 MASS_7.3-29
>>>> [5] nnet_7.3-7 rpart_4.1-3 stats4_3.0.2 tools_3.0.2
>>>>
