[R] freetype 2.5.2, problem with the survival package, build R 2.15.x with gcc 4.8.x

Hin-Tak Leung htl10 at users.sourceforge.net
Wed Dec 18 01:06:30 CET 2013


------------------------------
On Fri, Dec 13, 2013 16:29 GMT David Winsemius wrote:

>
>On Dec 11, 2013, at 7:30 PM, Hin-Tak Leung wrote:
>
>> Here is a rather long discussion etc about freetype 2.5.2, problem with the survival package, and build R 2.15.x with gcc 4.8.x. Please feel free to skip forward.
>> 
>> - freetype 2.5.2:
>> 
>> the fix to cope with one of the Mac OS X's system fonts just before the release of freetype 2.5.1 caused a regression, crashing over one of Microsoft windows' system fonts. So there is a 2.5.2. There are new 2.5.2 bundles for windows & Mac OS X. The official win/mac binaries of R were built statically with 2+-years-old freetype with a few known problems. Most should upgrade/rebuild.
>> 
>> http://sourceforge.net/projects/outmodedbonsai/files/R/
>> 
>> - problem with the survival package:
>> 
>> Trying to re-run a vignette to get the same result as two years ago
>> reveal a strange change. I went and bisected it down to
>> r11513 and r11516 of the survival package.
>> 
>> -------------- r11513 --------------------
>> clogit(cc ~ addContr(A) + addContr(C) + addContr(A.C) + strata(set))
>> 
>> 
>>                   coef exp(coef) se(coef)     z      p
>> addContr(A)2     -0.620     0.538    0.217 -2.86 0.0043
>> addContr(C)2      0.482     1.620    0.217  2.22 0.0270
>> addContr(A.C)1-2 -0.778     0.459    0.275 -2.83 0.0047
>> addContr(A.C)2-1     NA        NA    0.000    NA     NA
>> addContr(A.C)2-2     NA        NA    0.000    NA     NA
>> 
>> Likelihood ratio test=26  on 3 df, p=9.49e-06  n= 13110, number of events= 3524
>> ------------------------------------------
>> 
>> ------------- r11516 ---------------------
>> clogit(cc ~ addContr(A) + addContr(C) + addContr(A.C) + strata(set))
>> 
>> 
>>                     coef exp(coef) se(coef)         z  p
>> addContr(A)2     -0.14250     0.867   110812 -1.29e-06  1
>> addContr(C)2      0.00525     1.005   110812  4.74e-08  1
>> addContr(A.C)1-2 -0.30097     0.740   110812 -2.72e-06  1
>> addContr(A.C)2-1 -0.47712     0.621   110812 -4.31e-06  1
>> addContr(A.C)2-2       NA        NA        0        NA NA
>> 
>> Likelihood ratio test=26  on 4 df, p=3.15e-05  n= 13110, number of events= 3524
>> ------------------------------------------
>> 
>> r11514 does not build, and r11515 have serious memory hogs, so the survival
>> package broke somewhere between r11513 and r11516. Anyway, here is the diff in
>> the vignette, and the data, etc is in the directory above. If somebody want to
>> fix this before I spend any more time on this particular matter, please feel free to do so.
>> 
>> http://sourceforge.net/projects/outmodedbonsai/files/Manuals%2C%20Overviews%20and%20Slides%20for%20talks/2013SummerCourse/practicals/with-answers/practical8_survival-clogit-diff.pdf/download
>> 
>> That's the one problem from David's 10 practicals which are not due to bugs in snpStats. Some might find it reassuring that only 3 of the 4 problems with the practicals are due to snpStats bugs.
>> 
>> http://sourceforge.net/projects/outmodedbonsai/files/Manuals%2C%20Overviews%20and%20Slides%20for%20talks/2013SummerCourse/practicals/with-answers/practical7_snpStatsBug-diff.pdf/download
>> http://sourceforge.net/projects/outmodedbonsai/files/Manuals%2C%20Overviews%20and%20Slides%20for%20talks/2013SummerCourse/practicals/with-answers/practical6_snpStatsBug-diff.pdf/download
>> http://sourceforge.net/projects/outmodedbonsai/files/Manuals%2C%20Overviews%20and%20Slides%20for%20talks/2013SummerCourse/practicals/with-answers/practical3_snpStatsBug-diff.pdf/download
>> 
>> - build R 2.15.x with gcc 4.8.x
>> 
>> I wish the R commit log was a bit more detailed with r62430 than just
>> "tweak needed for gcc 4.8.x". Anyway, building R 2.15.x with gcc 4.8.x
>> could result in segfaults in usage as innocent and essential
>> as running summary() on a data.frame:
>> 
>> --------------------------------
>> *** caught segfault ***
>> address 0x2f8e6a00, cause 'memory not mapped'
>> 
>> Traceback:
>> 1: sort.list(y)
>> 2: factor(a, exclude = exclude)
>> 3: table(object, exclude = NULL)
>> 4: summary.default(X[[3L]], ...)
>> 5: FUN(X[[3L]], ...)
>> 6: lapply(X = as.list(object), FUN = summary, maxsum = maxsum, digits = 12,   ...)
>> 7: summary.data.frame(support)
>> ...
>> --------------------------------
>> 
>> r62430 needs a bit of adapting to apply to R 2.15.x , but you get the idea.
>> I hope this info is useful to somebody else who is still using R 2.15.x , no doubt for very good reasons.
>
>First: Sorry for the blank message. Need more coffee.
>
>Second: Does this mean that only Mac users who are still using 2.15.x need to worry about this issue?
>

The freetype issues affects both windows and mac users. Unix users have it easier, since R on unices (*excluding* Mac OS X) dynamically
links to the system's shared freetype, so upgrading at the system level would work. R for windows and Mac OS X are statically linked to
a rather out-dated version of freetype.

The survival package issues affects everybody using R more recent than survival package r11513 
Date:   Wed Feb 1 22:47:36 2012 +0000
    Remove "browser()" line from survobrien,
    add coxexact.fit

>Third: I'm reading this (and Terry's comment about singularity conditions)  to mean that a numerical  discrepancy between vignette output when code was run being from what was expected was causing a segfault under some situation that I cannot quite reconstruct. Was the implication that Mac users (of 2.15.x) need to build from sources only if they wanted to build the survival package from source? Does this have any implications for those of us who use the survival package as the binary? (And I'm using 3.0.2, so a split answer might be needed to cover 2.15.x and the current versions separately)
>

Your comprehension of the issue seem to be entirely wrong. Between r11513 and r11516, some tuning of internal parmeters were done, so the process of finding the rank of a singular matrix no longer converges (within the time/tolerance implicitly specified). There are warnings issued, but then there are misc warnings before and after (and one gets "desensitised" about them). Also the nature of the problem, which is to test for possibility of interactions - or lacking thereof -

outcome ~ factor A + factor B + factor A x factor B

or just extra terms in "outcome ~ factor A + factor B + ..." as an exploration of auxiliary effects, more often than not extra terms won't make
any difference and the matrix involved just isn't the nicest to manipulate; it is in the nature of that kind of exploratory work.

Professor Therneau replied that it is possible to get the older convergent behaviour by manual tuning of some of the convergence criteria parameters; I have responded that while that is possible, often one is simultaneously exploring many models with many possible auxiliary effects (and lacking thereof), manual tuning for each is neither feasible nor appropriate; and we sort of left it at that.

BTW, I trimmed the vignette and the data down - from a 70MB thing - to a 40k and about 20 lines of R code, and put it under *_trimmed.{Rcode/Rda}.
http://sourceforge.net/projects/outmodedbonsai/files/Manuals%2C%20Overviews%20and%20Slides%20for%20talks/2013SummerCourse/practicals/with-answers/

and with the outcomes (*.Rout.*) from R 3.1.0 (R dev trunk), R 2.15.x, and R 2.15.x with survival r11513.

There are also *_memoryhog.{Rcode,Rda}, for those who want to see what's the memory hog problem with r11515. Obviously there is no Rout files, since I had to kill the R process to stop it hogging my system :-).

As for the gcc 4.8.x issue, I rather think describing r62430 as "tweak needed for gcc 4.8.x" is unfortunate. For those who haven't got
R dev trunk history handy, r62430 put a zero at the end of an array of 16 to make it 17-element long. Without it, as I wrote,
R 2.15.x built that way would segfault at very innocent things like doing a summary() on a data.frame. (r62431 is part of R 3.0.0 RC).

However, if put a zero at the end of an array of 16 to make it 17-element long is a "fix" to a segmentation fault, it must mean that the code has always been wrong, and that it had relied on the C compiler to generously pad with nulls on uninitialized memory, for the code to work as intended beforehand. AFAIK, the Sun studio compiler behaves that way, and so does a few proprietary unix system's C compiler; It is notably not true for gcc (the gcc developers largerly think programmers should write good code where the i's are dotted and t's are crossed, instead of having the compiler protecting them from their own oversights); moreover, on recent redhat fedora systems (where gcc 4.8.x is likely the first to land), uninitialized memories are explicitly filled with random non-nulls to foil malwares which utilises and skips nulls (=no-ops) to jump to the next instruction the malware places in memory. So "tweak needed for gcc 4.8.x" just isn't a good
 description for that change.

>-- 
>David.
>> 
>> Hin-Tak Leung wrote:
>> The freetype people fixed the 2nd set of issues with system fonts shipped with
>> Mac OS X, and released 2.5.1 almost immediately after that. So there are
>> new bundles under http://sourceforge.net/projects/outmodedbonsai/files/R/ .
>> 
>> Just a reminder that the official R binaries for windows/mac OS X are statically
>> linked with rather dated versions of freetype with a few known issues. This
>> affects the cairo-based functionalities in R. So a rebuild is needed.
>> 
>> Most unix users should just upgrade their system's libfreetype, and
>> dynamic-linking should take care of the rest.
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>David Winsemius
>Alameda, CA, USA
>



More information about the R-help mailing list