[R] Question concerning side effects of treating invalid factor levels

tibor@kiss m@iii@g oii rub@de tibor@kiss m@iii@g oii rub@de
Mon Sep 19 13:38:33 CEST 2022


Dear Eric,

thank you very much. I wouldn’t have come to the idea to look up the help page for _c()_, which of course explains the coercion to the highest type. 

Best

T.


> Am 19.09.2022 um 13:31 schrieb Eric Berger <ericjberger using gmail.com>:
> 
> You are misinterpreting what is going on.
> The rbind command includes c(char, char, int) which produces a
> character vector of length 3.
> This is what you are rbind-ing which changes the type of the RT column.
> 
> If you do rbind(df, data.frame(P="in", ANSWER="V>N",
> RT=round(runif(1,7000,16000),0)))
> you will see that everything is fine. (New factor values are created.)
> 
> HTH,
> Eric
> 
> On Mon, Sep 19, 2022 at 2:14 PM Tibor Kiss via R-help
> <r-help using r-project.org> wrote:
>> 
>> Dear List members,
>> 
>> I have tried now for several times to find out about a side effect of treating invalid factor levels, but did not find an answer. Various answers on stackexchange etc. produce the stuff that irritates me without even mentioning it.
>> So I am asking the list (apologies if this has been treated in the past).
>> 
>> If you add an invalid factor level to a column in a data frame, this has the side effect of turning a numerical column into a column with character strings. Here is a simple example:
>> 
>>> df <- data.frame(
>>        P = factor(c("mittels", "mit", "mittels", "ueber", "mit", "mit")),
>>        ANSWER = factor(c(rep("PP>OBJ", 4), rep("OBJ>PP", 2))),
>>        RT = round(runif(6, 7000, 16000), 0))
>> 
>>> str(df)
>> 'data.frame':   6 obs. of  3 variables:
>> $ P     : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1
>> $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1
>> $ RT    : num  11157 13719 14388 14527 14686 ..
>> 
>>> df <- rbind(df, c("in", "V>N", round(runif(1, 7000, 16000), 0)))
>> 
>>> str(df)
>> 'data.frame':   7 obs. of  3 variables:
>> $ P     : Factor w/ 3 levels "mit","mittels",..: 2 1 2 3 1 1 NA
>> $ ANSWER: Factor w/ 2 levels "OBJ>PP","PP>OBJ": 2 2 2 2 1 1 NA
>> $ RT    : chr  "11478" "15819" "8305" "8852" …
>> 
>> You see that RT has changed from _num_ to _chr_ as a side effect of adding the invalid factor level as NA. I would appreciate understanding what the purpose of the type coercion is.
>> 
>> Thanks in advance
>> 
>> 
>> Tibor
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list