[R] Changing entries of column of type "factor"/Adding a new level to a factor

David Winsemius dwinsemius at comcast.net
Mon Aug 27 22:52:00 CEST 2012


On Aug 27, 2012, at 12:18 PM, Bert Gunter wrote:

> Well ...See below.
>
> -- Cheers, Bert
>
> On Mon, Aug 27, 2012 at 9:19 AM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
>>
>> On Aug 27, 2012, at 3:09 AM, Fridolin wrote:
>>
>>> What is a smart way to change an entry inside a column of a  
>>> dataframe or
>>> matrix which is of type "factor"?
>>>
>>> Here is my script incl. input data:
>>>>
>>>> #set working directory:
>>>> setwd("K:/R")
>>>>
>>>> #read in data:
>>>> input<-read.table("Exampleinput.txt", sep="\t", header=TRUE)
>>>>
>>>> #check data:
>>>> input
>>>
>>>  Ind      M1      M2      M3
>>> 1    1   96/98 120/120     0/0
>>> 2    2 102/108 120/124 305/305
>>> 3    3  96/108 120/120     0/0
>>> 4    4     0/0 116/120 300/305
>>> 5    5  96/108 120/130 300/305
>>> 6    6   98/98 116/120 300/305
>>> 7    7  98/108 120/120 305/305
>>> 8    8  98/108 120/120 305/305
>>> 9    9  98/102 120/124 300/300
>>> 10  10 108/108 120/120 305/305
>>>>
>>>> str(input)
>>>
>>> 'data.frame':   10 obs. of  4 variables:
>>> $ Ind: int  1 2 3 4 5 6 7 8 9 10
>>> $ M1 : Factor w/ 8 levels "0/0","102/108",..: 5 2 4 1 4 8 7 7 6 3
>>> $ M2 : Factor w/ 4 levels "116/120","120/120",..: 2 3 2 1 4 1 2 2  
>>> 3 2
>>> $ M3 : Factor w/ 4 levels "0/0","300/300",..: 1 4 1 3 3 3 4 4 2 4
>>>>
>>>>
>>>> #replace 0/0 by 999/999:
>>>> for (r in 1:10)
>>>
>>> +   for (c in 2:4)
>>> +     if (input[r,c]=="0/0") input[r,c]<-"999/999"
>>> Warnmeldungen:
>>> 1: In `[<-.factor`(`*tmp*`, iseq, value = "999/999") :
>>> invalid factor level, NAs generated
>>> 2: In `[<-.factor`(`*tmp*`, iseq, value = "999/999") :
>>> invalid factor level, NAs generated
>>> 3: In `[<-.factor`(`*tmp*`, iseq, value = "999/999") :
>>> invalid factor level, NAs generated
>>>>
>>>> input
>>>
>>>  Ind      M1      M2      M3
>>> 1    1   96/98 120/120    <NA>
>>> 2    2 102/108 120/124 305/305
>>> 3    3  96/108 120/120    <NA>
>>> 4    4    <NA> 116/120 300/305
>>> 5    5  96/108 120/130 300/305
>>> 6    6   98/98 116/120 300/305
>>> 7    7  98/108 120/120 305/305
>>> 8    8  98/108 120/120 305/305
>>> 9    9  98/102 120/124 300/300
>>> 10  10 108/108 120/120 305/305
>>>
>>>
>>> I want to replace all "0/0" by "999/999". My code should work for  
>>> columns
>>> of
>>> type "character" and "integer". But to make it work for a "factor"- 
>>> column
>>> I
>>> would need to add the new level of "999/999" at first, I guess.  
>>> How do I
>>> add
>>> a new level?
>>
>>
>> ?levels
>>
>> levels(input$M1) <- c(levels(input$M1), "999/999")
>
> This adds an additional level; then you have to replace the "0/0"
> level with this one; then you have to call levels again to remove the
> "0/0" level.

Then do it this way (different from what I thought was originally  
desired):

 > x <- factor(letters[1:3])
 > levels(x) <- c("d", levels(x)[2:3])
 > x
[1] d b c
Levels: d b c

>
> I think the following slight tweak may be preferred, as illustrated
> with a little example (opinions?):
>
>> x <- factor(letters[1:3])
>> x
> [1] a b c
> Levels: a b c
>
> ## create a new levels vector
>> newlvl <- levels(x)
>> newlvl[newlvl == "a"] <- "d"
>
> ## Create the new factor and replace the old with it
>
>> x <- factor(newlvl[x])
>> x
> [1] d b c
> Levels: b c d
>
> Note, however, as Bill D. said, in either case your level ordering --
> which will be used, e.g. in printing and displaying -- will be weird.

So the above method might be what you expect. Several options are now  
available to the questioner.

-- 
David.
>
>
>
>>
>> --
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> -- 
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

David Winsemius, MD
Alameda, CA, USA




More information about the R-help mailing list