[R] Antwort: Re: Factors and Alternatives
David L Carlson
dcarlson at tamu.edu
Tue May 9 14:38:12 CEST 2017
I'm not sure I understand your question, but you can easily include all possible answers when you create the factor by using the levels= argument as Bob pointed out. Here is an example of values that range from 1 to 6, but value 3 is not represented. Notice that a factor level 3 is created even though it does not appear in the data:
> set.seed(42)
> x <- sample.int(6, 10, replace=TRUE)
> table(x)
x
1 2 4 5 6
1 1 3 3 2
> y <- factor(x, levels=1:6)
> y
[1] 6 6 2 5 4 4 5 1 4 5
Levels: 1 2 3 4 5 6
-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of G.Maubach at weinwolf.de
Sent: Tuesday, May 9, 2017 6:37 AM
To: Bob O'Hara <rni.boh at gmail.com>
Cc: r-help <r-help at r-project.org>
Subject: [R] Antwort: Re: Factors and Alternatives
Hi Bob,
many thanks for your reply.
I have read the documentation. In my current project I use "item
batteries" for dimensions of touchpoints which are rated by our customers.
I wrote functions to analyse them. If I create a factor before filtering
and analysing I lose the original values of the variable. If I use the
original variable for filtering and analysis I might happen that for some
dimensions values were not selected. This means they are not NA but none
of the respondents chose "4" for instance on a scale from 1 to 6. That
means that creating a factor from the analysed data with the complete
scale (1:6) fails due the different vector length (amount of remaining
unique values in the analysis vs values in the scale). As I have a
function doing the analysis I am looking for a way to make my function
robust to such circumstances and be able to use it to analyse all "item
batteries". Thus my question. I believe my findings are not odd. Maybe
there is a way dealing with that kind of problems in R and I am eager to
learn how it can be solved using R.
What would you suggest?
Kind regards
Georg
Von: "Bob O'Hara" <rni.boh at gmail.com>
An: G.Maubach at weinwolf.de,
Kopie: r-help <r-help at r-project.org>
Datum: 09.05.2017 12:26
Betreff: Re: [R] Factors and Alternatives
That's easy! First
> str(test3)
Factor w/ 2 levels "WITHOUT Contact",..: 2 2 2 2 1 1 1 1 1 1
tells you that the internal values are 1 and 2, and the labels are
"WITHOUT Contact" and "WITH Contact". If you read the help page for
factor() you'll see this:
levels: an optional vector of the values (as character strings) that
‘x’ might have taken. The default is the unique set of
values taken by ‘as.character(x)’, sorted into increasing
order _of ‘x’_. Note that this set can be specified as
smaller than ‘sort(unique(x))’.
labels: _either_ an optional character vector of (unique) labels for
the levels (in the same order as ‘levels’ after removing
those in ‘exclude’), _or_ a character string of length 1.
So, when you create test3 you say that test can take values 0 and 1,
and these should be labelled as "WITHOUT Contact" and "WITH Contact".
So R internally codes "1" as 1 and "0" as 2 (internally R codes
factors as integers, which can be both useful and dangerous), and then
gives them labels "WITHOUT Contact" and "WITH Contact". It now doesn't
care that they were 1 and 0, because you've told it to change the
labels.
If you want to filter by the original values, then don't change the
labels (or at least not until after you've filtered by the original
labels), or convert the filter to the new labels. You're asking for a
data structure with two sets of labels, which sounds odd in general.
Bob
On 9 May 2017 at 12:12, <G.Maubach at weinwolf.de> wrote:
> Hi All,
>
> I am using factors in a study for the social sciences.
>
> I discovered the following:
>
> -- cut --
>
> library(dplyr)
>
> test1 <- c(rep(1, 4), rep(0, 6))
> d_test1 <- data.frame(test)
>
> test2 <- factor(test1)
> d_test2 <- data.frame(test2)
>
> test3 <- factor(test1,
> levels = c(0, 1),
> labels = c("WITHOUT Contact", "WITH Contact"))
> d_test3 <- data.frame(test3)
>
> d_test1 %>% filter(test1 == 0) # works OK
> d_test2 %>% filter(test2 == 0) # works OK
> d_test3 %>% filter(test3 == 0) # does not work, why?
>
> myf <- function(ds) {
> print(levels(ds$test3))
> print(labels(ds$test3))
> print(as.numeric(ds$test3))
> print(as.character(ds$test3))
> }
>
> # This showsthat it is not possible to access the original
> # values which were the basis to build the factor:
> myf(d_test3)
>
> -- cut --
>
> Why is it not possible to use a factor with labels for filtering with
the
> original values?
> Is there a data structure that works like a factor but gives also access
> to the original values?
>
> Kind regards
>
> Georg
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Bob O'Hara
NOTE NEW ADDRESS!!!
Institutt for matematiske fag
NTNU
7491 Trondheim
Norway
Mobile: +49 1515 888 5440
Journal of Negative Results - EEB: www.jnr-eeb.org
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list