[Rd] vctrs: a type system for the tidyverse
jori@mey@ @ending from gm@il@com
Thu Aug 9 14:55:30 CEST 2018
my point actually came from a data analyst point of view. A character
variable is something used for extra information, eg the "any other ideas?"
field of a questionnaire. A categorical variable is a variable describing
categories defined by the researcher. If it is made clear that a factor is
the object type needed for a categorical variable, there is no confusion.
All my students get it. But I agree that in many cases people are taught
that a factor is somehow related to character variables. And that does not
make sense from a data analyst point of view if you think about variables
as continuous, ordinal and nominal in a model context.
So I don't think adding more confusing behaviour and pitfalls is a solution
to something that's essentially a misunderstanding. It's something that's
only solved by explaining it correctly imho.
On Thu, Aug 9, 2018 at 2:36 PM Hadley Wickham <h.wickham using gmail.com> wrote:
> I 100% agree with you, and is this the behaviour that vctrs used to
> have and dplyr currently has (at least in bind_rows()). But
> pragmatically, my experience with dplyr is that people find this
> behaviour confusing and unhelpful. And when I played the full
> expression of this behaviour in vctrs, I found that it forced me to
> think about the levels of factors more than I'd otherwise like to: it
> made me think like a programmer, not like a data analyst. So in an
> ideal world, yes, I think factors would have stricter behaviour, but
> my sense is that imposing this strictness now will be onerous to most
Department of Data Analysis and Mathematical Modelling
Coupure Links 653, B-9000 Gent (Belgium)
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
[[alternative HTML version deleted]]
More information about the R-devel