[Rd] vctrs: a type system for the tidyverse

Joris Meys jori@mey@ @ending from gm@il@com
Thu Aug 9 14:55:30 CEST 2018

Hi Hadley,

my point actually came from a data analyst point of view. A character
variable is something used for extra information, eg the "any other ideas?"
field of a questionnaire. A categorical variable is a variable describing
categories defined by the researcher. If it is made clear that a factor is
the object type needed for a categorical variable, there is no confusion.
All my students get it. But I agree that in many cases people are taught
that a factor is somehow related to character variables. And that does not
make sense from a data analyst point of view if you think about variables
as continuous, ordinal and nominal in a model context.

So I don't think adding more confusing behaviour and pitfalls is a solution
to something that's essentially a misunderstanding. It's something that's
only solved by explaining it correctly imho.


On Thu, Aug 9, 2018 at 2:36 PM Hadley Wickham <h.wickham using gmail.com> wrote:

> I 100% agree with you, and is this the behaviour that vctrs used to
> have and dplyr currently has (at least in bind_rows()). But
> pragmatically, my experience with dplyr is that people find this
> behaviour confusing and unhelpful. And when I played the full
> expression of this behaviour in vctrs, I found that it forced me to
> think about the levels of factors more than I'd otherwise like to: it
> made me think like a programmer, not like a data analyst. So in an
> ideal world, yes, I think factors would have stricter behaviour, but
> my sense is that imposing this strictness now will be onerous to most
> analysts.
> Hadley
> --
> http://hadley.nz

Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)

Biowiskundedagen 2017-2018

Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]

More information about the R-devel mailing list