[Rd] I've written a big review of R. Can I get some feedback?

Taras Zakharko t@r@@@z@kh@rko @end|ng |rom uzh@ch
Wed Apr 13 09:50:05 CEST 2022


Hi Reece, 

> Thanks for the feedback. Much of what you've said seems to agree with a common trend that I've seen in other feedback. Namely, you seem to agree with the many that have told me that using R as anything other than as a tool for data analysis was a grave mistake. I'm increasingly starting to suspect that you're all right. I therefore have little to no counters to your points.

We have used R to develop fairly complex data transformation pipelines (which include custom data validation, custom async task scheduling layer and other non-trivial components), and I regularly use R to prototype complex data structure algorithms before implementing them in something more low-level like C. It can be a pain sometimes of course, but R is very well suited for data-oriented design, especially when you leverage the excellent low-level work done by the tidyverse group (especially rlang, vctrs and purrr). Of course, if you are used to OOP everywhere it will be very tough, but it’s the 21st century after all, not the nineties :)

Overall, I would say that there are three big issues with R that I don’t think can be fixed (the fourth smaller issue is the standard library, but thats a different discussion):

1. The type system is very weak and idiosyncratic, this makes more complex applications difficult and error-prone. 

2. The language itself is unsound. R’s semantics have been developed on the fly, with new features driven by the pragmatic necessity.  R is essentially LISP with C-like syntax and a very leaky runtime, all this makes it very powerful but also offers you multiple ways to shoot yourself in the foot. A very simple example: lazy evaluation in R is everywhere, but lazy expressions can contain side effects, which equals FUN. R’s philosophy here is very pragmatic: the user is responsible. And it works very well most of the time, until it doesn’t. 

3. The performance is bad, very bad. This is due to the fact that R’s runtime uses linked lists (arguably the worst data structure for modern hardware) everywhere and you can’t do even the simplest operation without performing multiple memory allocations. R’s core team did some fantastic work in the recent years, like the inclusion of the byte code compiler, lazy data types (ALTREP) etc., but there is only that much one can do under the circumstances. 

I do think one could make a next-gen R by giving it sound evaluation semantics, consistent (stricter) type system, first class-support for meta programming hygiene, drop the linked list implementation in favour of something modern like immutable data structures etc. ect… and it would be a very nifty and powerful little data processing language. Unfortunately, it will also break all the existing code, rendering it fairly useless. Because if one goes though all that effort one might as well just migrate to Julia. 

In the end, R might be frustrating at times and its implementation is dated, but it is still very much usable. And of course, it is the language we love and cherish, and so we carry on :)

Best, 

Taras




> On 12 Apr 2022, at 23:31, Reece Goding <Reece.Goding using outlook.com> wrote:
> 
> Hi Gabriel,
> 
> Thanks for the feedback. Much of what you've said seems to agree with a common trend that I've seen in other feedback. Namely, you seem to agree with the many that have told me that using R as anything other than as a tool for data analysis was a grave mistake. I'm increasingly starting to suspect that you're all right. I therefore have little to no counters to your points.
> 
> As for what you've said in reply to my "mapply challenge", I admit that your response is logical and may even be the best possible answer. However, I find it disturbing that the solution to my puzzle appears to rest on a having a very careful and very specific understanding of what the words "vectorize over" means in the documentation. You could well be right, but it doesn't sit well with me.
> 
> I'll further consider what you've said about the rest. I'm already making some changes.
> 
> Thanks again,
> Reece
> 
> ________________________________________
> From: Gabriel Becker <gabembecker using gmail.com <mailto:gabembecker using gmail.com>>
> Sent: 12 April 2022 00:28
> To: Toby Hocking; Reece.Goding using outlook.com <mailto:Reece.Goding using outlook.com>
> Cc: r-devel using r-project.org <mailto:r-devel using r-project.org>
> Subject: Re: [Rd] I've written a big review of R. Can I get some feedback?
> 
> Hi Reece,
> 
> I'm not really sure what kind of review you're looking for (and I'm not certain this is the right place for it, but hopefully its ok enough). Also, to channel Pascal, forgive me, I would have written a shorter response but I didn't have the time.
> 
> Firstly, it is fairly ... partisan, I suppose, for lack of a better term.
> 
> More importantly from a usefulness perspective you often notably don't present the knowledge you gained at the end of the various frustrations you had. As one example that jumped out to me, you say
> 
> "One day, you’ll be tripped up by R’s hierarchy of how it likes to simplify mixed types outside of lists. "
> 
> but you don't present your readers with the (well defined) coercion hierarchy so that they would, you know, not be tripped up by it as badly. This is probably my largest issue with your document overall. It can give the reader talking points about how R is bad (not all of which are even incorrect, per se, as many expert R users will be happy to tell you), but it won't really help people become better R users in many cases.
> 
> Your article also, I suspect, fails to understand what a typical "Novice R Users" is and what they want to do. By and large they want to analyze data and create plots. They are analysts, NOT programmers (writing analysis scripts is not programming in the typical sense, and I'm not the only one who thinks that).
> 
> So the point you make early on in your explanation why you do not strongly recommend R For Data Science (which I had no part in writing and have not read myself) that it
> 
> "It deliberately avoids the fundamentals of programming – e.g. making functions, loops, and if statements – until the second half. I therefore suspect that any non-novice would be better off finding an introduction to the relevant packages with their favourite search engine."
> 
> misses the point of R itself for what I'd claim is the "typical novice R user".
> 
> Having read through your review, I'm confused why you were using R to do some of the things I'm inferring that you felt like you needed it to do. If you picked up R wanting an applicable equally to all programming problem domains general purpose language, you're going to have a bad time. Mostly because that is not what R is.
> 
> Finally, a (very) incomplete response to a few of the more specific points raised in your review:
> 
> Lists:
> 
> The linked stack overflow question (https://stackoverflow.com/questions/2050790/how-to-correctly-use-lists-in-r) shows a pretty fundamental misunderstanding of what lists and atomic vectors are/do in R. There is nothing wrong with this, asking questions we don't know the answer to is how we learn, but I'm not sure the question serves as well as a primer for R lists as you claim. The top answer at time of writing discusses the C level structure of R objects, which can, I suppose, inform your knowledge on how lists at the R level work, but is NOT necessary nor the most pedagogically useful way to present it.
> 
> Strings:
> 
> Strings are not arrays of characters idiomatically at the R level, they are atomic observed values within a (character) vector of data. Yes, deep down in the C code they are arrays of characters, but not at the R level. As such, splitting the elements of a character vector into their respective component individual characters is not (at all, in my experience) a common operation. charvec[1] within typical R usage (where charvec is a vector of data) is much more likely to be intended to select the first observation for the data vector, which it does. Given what R is for, frankly I think it'd be fairly insane for charvec[1] to do what substr does.
> 
> Variable Manipulation
> 
> Novice users shouldn't be calling eval. This is not to gatekeep it from them, like we have some special "eval-callers" club that they're not invited to. Rather, it is me saying that metaprogramming is not a novice-difficulty task in R (or, I would expect, anywhere else really).
> 
> You also say "variable names" in this section where you mean "argument names" and that distinction is both meaningful and important. Variable names, are not partially matched:
> 
>> xyz <- 5
> 
>> x
> 
> Error: object 'x' not found
> 
> Subsetting:
> 
> I'm fairly certain arrays (including 2d matrices are stored in column order rather than row order because that has been the standard for linear algebra on computers since before I knew what either of those things were...
> 
> tail(x,1) is the idiomatic way of getting the last element of a vector. The people on stackoverflow that told you this was "very slow" were misguided at best. It takes ~6000 nanoseconds on my laptop, compared to the ~200 nanoseconds x[length(x)]. Yes, that is a 30x speedup; no, it doesn't matter in practice.
> 
> I'm going to stop now because this is already too long, but this type of response continues to be possible throughout.
> 
> Lastly, with regard to your mapply challenge. and I quote directly from the documentation (emphasis mine):
> 
> 
> ...: arguments to vectorize over (vectors or lists of strictly
> 
> positive length, or all of zero length). See also ‘Details’.
> 
> MoreArgs: a list of other arguments to ‘FUN’.
> 
> 
> 
> ... is the arguments you vectorize over, so FUN gets one element of each thing in ... for each call. MoreArgs, then, is the set of arguments to FUN which you don't vectorize over, ie where each call to FUN gets the whole thing. That's it, that's the whole thing.
> 
> 
> I don't disagree that this could be clearer (as Ben pointed out, a documentation patch would be the way to address this), but its not correct to say the information isn't in there at all.
> 
> 
> Best,
> 
> ~G
> 
> On Mon, Apr 11, 2022 at 1:52 PM Toby Hocking <tdhock5 using gmail.com <mailto:tdhock5 using gmail.com><mailto:tdhock5 using gmail.com <mailto:tdhock5 using gmail.com>>> wrote:
> You could take some of your observations and turn them into patches that
> would help improve R. (discussion of such patches is one function of this
> email list)
> 
> On Sun, Apr 10, 2022 at 9:05 AM Stephen H. Dawson, DSL via R-devel <
> r-devel using r-project.org <mailto:r-devel using r-project.org><mailto:r-devel using r-project.org <mailto:r-devel using r-project.org>>> wrote:
> 
>> Hi Reece,
>> 
>> 
>> Thanks for the article. What specific feedback do you seek for your
>> writing?
>> 
>> 
>> Kindest Regards,
>> *Stephen Dawson, DSL*
>> /Executive Strategy Consultant/
>> Business & Technology
>> +1 (865) 804-3454
>> http://www.shdawson.com <http://www.shdawson.com/>
>> 
>> 
>> On 4/9/22 15:52, Reece Goding wrote:
>>> Hello,
>>> 
>>> For a while, I've been working on writing a very big review of R. I've
>> finally finished my final proofread of it. Can I get some feedback? This
>> seems the most appropriate place to ask. It's linked below.
>>> 
>>> https://github.com/ReeceGoding/Frustration-One-Year-With-R <https://github.com/ReeceGoding/Frustration-One-Year-With-R>
>>> 
>>> If you think you've seen it before, that will be because it found some
>> popularity on Hacker News before I was done proofreading it. The reception
>> seems largely positive so far.
>>> 
>>> Thanks,
>>> Reece Goding
>>> ______________________________________________
>>> R-devel using r-project.org <mailto:R-devel using r-project.org><mailto:R-devel using r-project.org <mailto:R-devel using r-project.org>> mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel <https://stat.ethz.ch/mailman/listinfo/r-devel>
>>> 
>> 
>> ______________________________________________
>> R-devel using r-project.org <mailto:R-devel using r-project.org><mailto:R-devel using r-project.org <mailto:R-devel using r-project.org>> mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel <https://stat.ethz.ch/mailman/listinfo/r-devel>
>> 
> 
> [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel using r-project.org <mailto:R-devel using r-project.org><mailto:R-devel using r-project.org <mailto:R-devel using r-project.org>> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel <https://stat.ethz.ch/mailman/listinfo/r-devel>
> 
> ______________________________________________
> R-devel using r-project.org <mailto:R-devel using r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel <https://stat.ethz.ch/mailman/listinfo/r-devel>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list