[Rd] duplicated factor labels.

Joris Meys jorismeys at gmail.com
Fri Jun 23 14:57:42 CEST 2017

On Fri, Jun 23, 2017 at 2:20 PM, Uwe Ligges <ligges at statistik.tu-dortmund.de
> wrote:

> I had the chance to look at > 1300 SPSS files our consulting center
> collected during the last 20 year, and in several hundred cases we found
> such a problem that was copy & paste error and simply wrong.
> Only in < 5 cases condensing several levels into one was appropriate,
> hence we decided to keep duplicated levels by changing the names as the
> default.

I understand where you're coming from. I know from personal experience
exactly how much this is a pain in the ass, but I also have to group
different labels in fewer categories in about every data set I get from
clients or students. Especially when things come from surveys with 30
different education categories etc.

So I would argue that checking for duplicate labels is a task for
read.spss() and can be added as an extra check if necessary. But I
personally don't see the fact that clients regularly mess up SPSS files as
enough of an argument to not change the behaviour of factor().

> Based on this experience I'd propose no to touch factor but rather add a
> function that easily allows for this reduction, if we do not have that
> already.

There are functions already that allow to do this, like the tidyverse
dplyr::recode_factor() function. It's rather trivial doing this with
logical operators and indices, and I have my own "recode" function so I
don't have to rely on any package or retype the same construct over and
over again but with different values.

But a clean and logical way to recode/group different levels when
constructing the factor, would be at least for me be very convenient. But
I'm just a guy and I'm not writing the code, so in the end it's up to you

Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel :  +32 (0)9 264 61 79
Joris.Meys at Ugent.be
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]

More information about the R-devel mailing list