[Rd] A few suggestions and perspectives from a PhD student

Tue May 9 12:31:38 CEST 2017


On 09/05/17 11:22, Joris Meys wrote:
>
>
> On Tue, May 9, 2017 at 9:47 AM, Hilmar Berger 
> <berger at mpiib-berlin.mpg.de <mailto:berger at mpiib-berlin.mpg.de>> wrote:
>
>     Hi,
>
>     On 08/05/17 16:37, Ista Zahn wrote:
>
>         One of the key strengths of R is that packages are not akin to
>         "fan
>         created mods". They are a central and necessary part of the R
>         system.
>
>     I would tend to disagree here. R packages are in their majority
>     not maintained by the core R developers. Concepts, features and
>     lifetime depend mainly on the maintainers of the package (even
>     though in theory GPL will allow to somebody to take over anytime).
>     Several packages that are critical for processing big data and
>     providing "modern" visualizations introduce concepts quite
>     different from the legacy S/R language. I do feel that in a way,
>     current core R shows strongly its origin in S, while modern
>     concepts (e.g. data.table, dplyr, ggplot, ...) are often only
>     available via extension packages. This is fine if one considers R
>     to be a statistical toolkit; as a programming language, however,
>     it introduces inconsistencies and uncertainties which could be
>     avoided if some of the "modern" parts (including language
>     concepts) could be more integrated in core-R.
>
>     Best regards,
>     Hilmar
>
>
> And I would tend to disagree here. R is build upon the paradigm of a 
> functional programming language, and falls in the same group as 
> clojure, haskell and the likes. It is a turing complete programming 
> language on its own. That's quite a bit more than "a statistical 
> toolkit". You can say that about eg the macro language of SPSS, but 
> not about R.
>
My point was that inconsistencies are harder to tolerate when using R as 
a programming language as opposed to a toolkit that just has to do a job.
> Second, there's little "modern" about the ideas behind the tidyverse. 
> Piping is about as old as unix itself. The grammar of graphics, on 
> which ggplot is based, stems from the SYStat graphics system from the 
> nineties. Hadley and colleagues did (and do) a great job implementing 
> these ideas in R, but the ideas do have a respectable age.
Those ideas seem still to be more modern than e.g. stock R graphics 
designed probably in the seventies or eighties. Which still do their job 
for lots and lots of applications, however, the fact that many newer 
packages use ggplot in stead of plot() forces users to learn and use 
different paradigms for things so simple as drawing a line.

I also would like to make clear that I do not advocate for including the 
whole tidyverse in core R. I just believe that having core concepts well 
supported in core R instead of implemented in a package might make 
things more consistent. E.g. method chaining ("%>%") is a core language 
feature in many languages.
>
> The one thing I would like to see though, is the adaptation of the 
> statistical toolkit so that it can work with data.table and tibble 
> objects directly, as opposed to having to convert to a data.frame once 
> you start building the models. And I believe that eventually there 
> will be a replacement for the data.frame that increases R's 
> performance and lessens its burden on the memory.
>
Which is a perfect example of what I mean: improved functionality should 
find their way into core R at some time point, replacing or extending 
outdated functionality. Otherwise, I don't know how hard it will be to 
develop 21st century methods on top of a 1980s/90s language core. 
Although I admit that the R developers are doing a great job to make it 
possible.

Best,
Hilmar

> So all in all, I do admire the tidyverse and how it speeds up data 
> preparation for analysis. But tidyverse is a powerful data toolkit, 
> not a programming language. And it won't make R a programming language 
> either. Because R is already.
>
> Cheers
> Joris
>
>
>     -- 
>     Dr. Hilmar Berger, MD
>     Max Planck Institute for Infection Biology
>     Charitéplatz 1
>     D-10117 Berlin
>     GERMANY
>
>     Phone: + 49 30 28460 430 <tel:%2B%2049%2030%2028460%20430>
>     Fax: + 49 30 28460 401 <tel:%2B%2049%2030%2028460%20401>
>      E-Mail: berger at mpiib-berlin.mpg.de
>     <mailto:berger at mpiib-berlin.mpg.de>
>     Web   : www.mpiib-berlin.mpg.de <http://www.mpiib-berlin.mpg.de>
>
>
>     ______________________________________________
>     R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-devel
>     <https://stat.ethz.ch/mailman/listinfo/r-devel>
>
>
>
>
> -- 
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
>
> tel :  +32 (0)9 264 61 79
> Joris.Meys at Ugent.be
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

-- 
Dr. Hilmar Berger, MD
Max Planck Institute for Infection Biology
Charitéplatz 1
D-10117 Berlin
GERMANY

Phone:  + 49 30 28460 430
Fax:    + 49 30 28460 401
  
E-Mail: berger at mpiib-berlin.mpg.de
Web   : www.mpiib-berlin.mpg.de


	[[alternative HTML version deleted]]