[R] Antwort: RE: Interdependencies of variable types, logical expressions and NA

PIKAL Petr petr.pikal at precheza.cz
Thu Apr 28 15:29:46 CEST 2016


Hi

?factor
help page says rather cryptically

The encoding of the vector happens as follows. First all the values in exclude are removed from levels. If x[i] equals levels[j], then the i-th element of the result is j. If no match is found for x[i] in levels (which will happen for excluded values) then the i-th element of the result is set to NA.

So if you specify levels when calling factor, each value not mentioned in levels is changed to NA

Factors are useful but sometimes their behaviour is rather tricky.

> x<-c(1,1)
> x.f<-factor(x, levels=c(0:1))
> c(x.f,2)
[1] 2 2 2
> x.f
[1] 1 1
Levels: 0 1
> as.numeric(x.f)
[1] 2 2
> x<-c(0,1,2)
> x.f<-factor(x, levels=c(0:1))
> x.f
[1] 0    1    <NA>
Levels: 0 1
> c(x.f,2)
[1]  1  2 NA  2
>
Cheers
Petr


> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of PIKAL Petr
> Sent: Thursday, April 28, 2016 2:32 PM
> To: G.Maubach at weinwolf.de
> Cc: r-help at r-project.org
> Subject: Re: [R] Antwort: RE: Interdependencies of variable types, logical
> expressions and NA
>
> Hi
>
> your initial ds
>
> > str(ds)
> 'data.frame':   2 obs. of  3 variables:
>  $ var1: num  1 1
>  $ var2: logi  TRUE FALSE
>  $ var3: logi  NA NA
>
> first result
> > str(ds)
> 'data.frame':   2 obs. of  6 variables:
>  $ var1             : num  1 1
>  $ var2             : logi  TRUE FALSE
>  $ var3             : logi  NA NA
>  $ value_and_logical: logi  TRUE TRUE
>  $ logical_and_na   : logi  TRUE NA
>  $ value_and_na     : logi  TRUE TRUE
>
> 1 is considered as TRUE therefore OR gives TRUE TRUE in first case, TRUE NA
> in second and TRUE TRUE in third
>
> Changing to factor changes var 2 to NA (I am not sure why)
>
> > str(ds)
> 'data.frame':   2 obs. of  3 variables:
>  $ var1: Factor w/ 2 levels "NOT ok","OK": 2 2
>  $ var2: Factor w/ 2 levels "NOT ok","OK": NA NA
>  $ var3: Factor w/ 2 levels "NOT ok","OK": NA NA
>
> And this results to warning
>
> > ds$value_and_logical <- ifelse(ds$var1 | ds$var2, TRUE, FALSE)
> Warning message:
> In Ops.factor(ds$var1, ds$var2) : '|' not meaningful for factors
> > ds$logical_and_na <- ifelse(ds$var2 | ds$var3, TRUE, FALSE)
> Warning message:
> In Ops.factor(ds$var2, ds$var3) : '|' not meaningful for factors
> > ds$value_and_na <- ifelse(ds$var1 | ds$var3, TRUE, FALSE)
> Warning message:
> In Ops.factor(ds$var1, ds$var3) : '|' not meaningful for factors
> > str(ds)
> 'data.frame':   2 obs. of  6 variables:
>  $ var1             : Factor w/ 2 levels "NOT ok","OK": 2 2
>  $ var2             : Factor w/ 2 levels "NOT ok","OK": NA NA
>  $ var3             : Factor w/ 2 levels "NOT ok","OK": NA NA
>  $ value_and_logical: logi  NA NA
>  $ logical_and_na   : logi  NA NA
>  $ value_and_na     : logi  NA NA
> >
>
> so | operation is not valid for factor variables and results to NA values.
>
> Cheers
> Petr
>
>
>
> > -----Original Message-----
> > From: G.Maubach at weinwolf.de [mailto:G.Maubach at weinwolf.de]
> > Sent: Thursday, April 28, 2016 12:00 PM
> > To: PIKAL Petr <petr.pikal at precheza.cz>
> > Subject: Antwort: RE: [R] Interdependencies of variable types, logical
> > expressions and NA
> >
> > Hi Petr,
> >
> > many thanks for your reply.
> >
> > Yes it's interesting. I did not understand what the truth table wanted to
> > say due to 4 columns instead of 3. But know I got it.
> >
> > The other thing is that logical expessions with NA work differently on
> > different types of variables as my example code shows:
> >
> > -- cut --
> > # Truth table for logicals and NA
> >
> > var2 <- c(TRUE, FALSE)
> > var3 <- c(NA, NA)
> > var1 <- c(1, 1)
> > ds <- data.frame(var1, var2, var3)
> > ds
> >
> > ds$value_and_logical <- ifelse(ds$var1 | ds$var2, TRUE, FALSE)
> > ds$logical_and_na <- ifelse(ds$var2 | ds$var3, TRUE, FALSE)
> > ds$value_and_na <- ifelse(ds$var1 | ds$var3, TRUE, FALSE)
> >
> > print(ds)
> >
> > ds$var1 <- factor(ds$var1, levels = c(0, 1), labels = c("NOT ok", "OK"))
> > ds$var2 <- factor(ds$var2, levels = c(0, 1), labels = c("NOT ok", "OK"))
> > ds$var3 <- factor(ds$var3, levels = c(0, 1), labels = c("NOT ok", "OK"))
> >
> > ds$value_and_logical <- ifelse(ds$var1 | ds$var2, TRUE, FALSE)
> > ds$logical_and_na <- ifelse(ds$var2 | ds$var3, TRUE, FALSE)
> > ds$value_and_na <- ifelse(ds$var1 | ds$var3, TRUE, FALSE)
> >
> > print(ds)
> > -- cut --
> >
> > Additionally the warning message that this script issues was not displayed
> > in my production code, but only in this test code.
> >
> > Also: Is "<NA>" the same as "NA"?
> >
> > Kind regards
> >
> > Georg
> >
> >
> >
> >
> > Von:    PIKAL Petr <petr.pikal at precheza.cz>
> > An:     "G.Maubach at weinwolf.de" <G.Maubach at weinwolf.de>,
> > "r-help at r-project.org" <r-help at r-project.org>,
> > Datum:  28.04.2016 10:02
> > Betreff:        RE: [R] Interdependencies of variable types, logical
> > expressions and NA
> >
> >
> >
> > Sorry
> > these
> >
> > T&NA = T (you can decide that regardless value in NA the result must be T)
> > F&NA = NA (you cannot decide hence NA)
> >
> > should be
> >
> > T | NA = T (you can decide that regardless value in NA the result must be
> > T)
> > F | NA = NA (you cannot decide hence NA)
> >
> > Cheers
> > Petr
> >
> > > -----Original Message-----
> > > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of PIKAL
> > Petr
> > > Sent: Thursday, April 28, 2016 9:42 AM
> > > To: G.Maubach at weinwolf.de; r-help at r-project.org
> > > Subject: Re: [R] Interdependencies of variable types, logical
> > expressions and
> > > NA
> > >
> > > Hi
> > >
> > > Your script is not reproducible.
> > >
> > > Creating Check_U_0__Kd_1_2011 from Umsatz_2011 and Kunde01_2011
> > > Error in ifelse(Kunden01[[Umsatz]] == 0 & Kunden01[[Kunde]] == 1, 1, 0)
> > :
> > >   object 'Kunden01' not found
> > > >
> > >
> > > This is interesting
> > > x <- c(NA, FALSE, TRUE)
> > > names(x) <- as.character(x)
> > > outer(x, x, "&") ## AND table
> > >        <NA> FALSE  TRUE
> > > <NA>     NA FALSE    NA
> > > FALSE FALSE FALSE FALSE
> > > TRUE     NA FALSE  TRUE
> > > >
> > >
> > > I am not sure, but the logic for AND is to return TRUE only when both
> > > expressions are TRUE.
> > >
> > > so
> > > T&T = T
> > > F&F = F
> > > T&NA = NA (you cannot decide hence NA)
> > > F&NA = F (you can decide that regardless of NA the result must be F)
> > >
> > > outer(x, x, "|") ## OR  table
> > >       <NA> FALSE TRUE
> > > <NA>    NA    NA TRUE
> > > FALSE   NA FALSE TRUE
> > > TRUE  TRUE  TRUE TRUE
> > >
> > > OTOH the logic for OR table is that if one of the expressions is TRUE
> > the result
> > > must be TRUE
> > > T | T = T
> > > F | F = F
> > > T&NA = T (you can decide that regardless value in NA the result must be
> > T)
> > > F&NA = NA (you cannot decide hence NA)
> > >
> > > And I believe that all your results can be explained by this logic.
> > >
> > > Cheers
> > > Petr
> > >
> > >
> > > > -----Original Message-----
> > > > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
> > > > G.Maubach at weinwolf.de
> > > > Sent: Thursday, April 28, 2016 9:08 AM
> > > > To: r-help at r-project.org
> > > > Subject: [R] Interdependencies of variable types, logical expressions
> > and
> > > NA
> > > >
> > > > Hi All,
> > > >
> > > > my script tries to do the following on factors:
> > > >
> > > > > ## Check for case 3: Umsatz = 0 & Kunde = 1
> > > > > for (year in 2011:2015) {
> > > > +   Umsatz <- paste0("Umsatz_", year)
> > > > +   Kunde <- paste0("Kunde01_", year)
> > > > +   Check <- paste0("Check_U_0__Kd_1_", year)
> > > > +
> > > > +   cat('Creating', Check, 'from', Umsatz, "and", Kunde, '\n')
> > > > +
> > > > +   Kunden01[[ Check ]] <- ifelse(Kunden01[[ Umsatz ]] == 0 &
> > > > +                                 Kunden01[[ Kunde ]] == 1,
> > > > +                                 1, 0
> > > > +                                 )
> > > > +   Kunden01[[ Check ]] <- factor(Kunden01[[ Check ]],
> > > > +                                 levels=c(1, 0),
> > > > +                                 labels= c("Check 0", "OK")
> > > > +                                 )
> > > > +
> > > > + }
> > > > Creating Check_U_0__Kd_1_2011 from Umsatz_2011 and
> Kunde01_2011
> > > > Creating Check_U_0__Kd_1_2012 from Umsatz_2012 and
> Kunde01_2012
> > > > Creating Check_U_0__Kd_1_2013 from Umsatz_2013 and
> Kunde01_2013
> > > > Creating Check_U_0__Kd_1_2014 from Umsatz_2014 and
> Kunde01_2014
> > > > Creating Check_U_0__Kd_1_2015 from Umsatz_2015 and
> Kunde01_2015
> > > > >
> > > > > table(Kunden01$Check_U_0__Kd_1_2011, useNA = "ifany")
> > > >
> > > > Check 0      OK    <NA>
> > > >       1      16      13
> > > > > table(Kunden01$Check_U_0__Kd_1_2012, useNA = "ifany")
> > > >
> > > > Check 0      OK    <NA>
> > > >       1      17      12
> > > > > table(Kunden01$Check_U_0__Kd_1_2013, useNA = "ifany")
> > > >
> > > > Check 0      OK    <NA>
> > > >       2      17      13
> > > > > table(Kunden01$Check_U_0__Kd_1_2014, useNA = "ifany")
> > > >
> > > > Check 0      OK    <NA>
> > > >       1      15      14
> > > > > table(Kunden01$Check_U_0__Kd_1_2015, useNA = "ifany")
> > > >
> > > > Check 0      OK    <NA>
> > > >       2      15      13
> > > > >
> > > > > Kunden01$Check_U_0__Kd_1_all <-
> > > > ifelse(Kunden01$Check_U_0__Kd_1_2011 ==
> > > > 1 |
> > > > +                                        Kunden01$Check_U_0__Kd_1_2012
> > ==
> > > > 1 |
> > > > +                                        Kunden01$Check_U_0__Kd_1_2013
> > ==
> > > > 1 |
> > > > +                                        Kunden01$Check_U_0__Kd_1_2014
> > ==
> > > > 1 |
> > > > +                                        Kunden01$Check_U_0__Kd_1_2015
> > ==
> > > > 1,
> > > > +                                        1, 0)
> > > > >
> > > > > table(Kunden01$Check_U_0__Kd_1_all, useNA = "ifany")
> > > >
> > > >     0  <NA>
> > > >     7    23
> > > >
> > > > (Ann.: I made the values up. But the relations equal real world data.)
> > > >
> > > > I had expected to get back a factor or at least a numeric variable
> > > > containing 0, 1 and NA, instead 1 is not included.
> > > >
> > > > I searched the web for information on the treatment of logical
> > expressions
> > > > when the data contains NA. I found:
> > > >
> > > > 1.
> > > > https://stat.ethz.ch/R-manual/R-devel/library/base/html/NA.html
> > > > Examples
> > > > # Some logical operations do not return NA
> > > > c(TRUE, FALSE) & NA
> > > > c(TRUE, FALSE) | NA
> > > >
> > > > 2.
> > > > https://stat.ethz.ch/R-manual/R-devel/library/base/html/Logic.html
> > > > NA is a valid logical object. Where a component of x or y is NA, the
> > > > result will be NA if the outcome is ambiguous. In other words NA &
> > TRUE
> > > > evaluates to NA, but NA & FALSE evaluates to FALSE. See the examples
> > > > below.
> > > >
> > > > ## construct truth tables :
> > > > x <- c(NA, FALSE, TRUE)
> > > > names(x) <- as.character(x)
> > > > outer(x, x, "&") ## AND table
> > > > outer(x, x, "|") ## OR  table
> > > > Ann. Not very useful. How should it be read?
> > > >
> > > > 3.
> > > > http://www.ats.ucla.edu/stat/r/faq/missing.htm
> > > > Good explanation for NA in general and in analysis, but no information
> > > > about NA in logical expressions.
> > > >
> > > > Then I made some tests with different data types and variables with
> > NA:
> > > >
> > > > -- cut --
> > > >
> > > > # 2016-04-27-001_truth_table_for_logicals_and_NA.R
> > > >
> > > > # Test 1
> > > > var2 <- c(TRUE, FALSE)
> > > > var3 <- c(NA, NA)
> > > > var1 <- c(1, 1)
> > > > ds <- data.frame(var1, var2, var3)
> > > > ds
> > > >
> > > > ds$value_and_logical <- ifelse(ds$var1 | ds$var2, TRUE, FALSE)
> > > > ds$logical_and_na <- ifelse(ds$var2 | ds$var3, TRUE, FALSE)
> > > > ds$value_and_na <- ifelse(ds$var1 | ds$var3, TRUE, FALSE)
> > > >
> > > > print(ds)
> > > > # Output
> > > > # var1  var2 var3 value_and_logical logical_and_na value_and_na
> > > > # 1    1  TRUE   NA              TRUE           TRUE         TRUE
> > > > # 2    1 FALSE   NA              TRUE             NA         TRUE
> > > >
> > > > # Test 2
> > > > ds$var1 <- factor(ds$var1, levels = c(0, 1), labels = c("NOT ok",
> > "OK"))
> > > > ds$var2 <- factor(ds$var2, levels = c(0, 1), labels = c("NOT ok",
> > "OK"))
> > > > ds$var3 <- factor(ds$var3, levels = c(0, 1), labels = c("NOT ok",
> > "OK"))
> > > >
> > > > ds$value_and_logical <- ifelse(ds$var1 | ds$var2, TRUE, FALSE)
> > > > ds$logical_and_na <- ifelse(ds$var2 | ds$var3, TRUE, FALSE)
> > > > ds$value_and_na <- ifelse(ds$var1 | ds$var3, TRUE, FALSE)
> > > >
> > > > # Output (abbrev.)
> > > > # Warning message:
> > > > #  In Ops.factor(ds$var1, ds$var3) : ?|? ist nicht sinnvoll für
> > Faktoren
> > > >
> > > > print(ds)
> > > > # Output
> > > > # var1 var2 var3 value_and_logical logical_and_na value_and_na
> > > > # 1   OK <NA> <NA>                NA             NA           NA
> > > > # 2   OK <NA> <NA>                NA             NA           NA
> > > >
> > > > -- cut --
> > > >
> > > > I had expected to get the same result in Test 2 as in Test 1.
> > > >
> > > > Where can I find information and documentation about NA handling in
> > > > logical expressions on different variable types?
> > > >
> > > > Kind regards
> > > >
> > > > Georg
> > > >
> > > > ______________________________________________

________________________________
Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system.
If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.



More information about the R-help mailing list