[R] How to replace NAs in a vector of factors?

Bill.Venables at csiro.au Bill.Venables at csiro.au
Wed Jul 22 03:22:17 CEST 2009


Couple of points:

1. if you are going to be replacing entries in factors with updated levels, it's probably easier if you start with your strings remaining as strings as they go into the data frames.  So here is how I would start your example


db1 <- data.frame(
    olditems = c('soup','','','','nuts'),
    prices = c(4.45, 3.25, 4.42, 2.25, 3.98), 
	stringsAsFactors = FALSE)
db2 <- data.frame(
    newitems = c('stew','crackers','tofu','goatsmilk','peanuts'), 
	stringsAsFactors = FALSE)


2. Strings with zero characters are still strings (like zero is still a number).  They are not missing.  If you want them to be made missing you can do so afterwards with:


#### zero length strings become NA 
is.na(db1$olditems[db1$olditems == '']) <- TRUE


3. Now to replace the missing values with the corresponding ones from the second data frame:


k <- is.na(db1$olditems)
db1[k, "olditems"] <- db2[k, "newitems"]


4. Check

> db1
   olditems prices
1      soup   4.45
2  crackers   3.25
3      tofu   4.42
4 goatsmilk   2.25
5      nuts   3.98
> 

5. If you really do want factors rather than character strings, you can now change back:

db1 <- within(db1, olditems <- factor(olditems)) ## use <- here!

6. check the difference

> str(db1)
'data.frame':   5 obs. of  2 variables:
 $ olditems: Factor w/ 5 levels "crackers","goatsmilk",..: 4 1 5 2 3
 $ prices  : num  4.45 3.25 4.42 2.25 3.98
> 
 


Bill Venables
http://www.cmis.csiro.au/bill.venables/ 


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Gene Leynes
Sent: Wednesday, 22 July 2009 10:39 AM
To: r-help at r-project.org
Subject: [R] How to replace NAs in a vector of factors?

# Just when I thought I had the basic stuff mastered....
# This has been quite perplexing, thanks for any help


## Here's the example:

db1=data.frame(
    olditems=c('soup','','','','nuts'),
    prices=c(4.45, 3.25, 4.42, 2.25, 3.98))
db2=data.frame(
    newitems=c('stew','crackers','tofu','goatsmilk','peanuts'))

str(db1)    #factors and prices
str(db2)    #new names, but I want *only* the updates

is.na(db1$olditems)  #a little surprising that '' is not equal to NA
db1$olditems==''     #oh good, at least I can get to the blanks this way
db1$olditems[db1$olditems=='']  #wait, only one item is returned?
db1[db1$olditems=='',]  #somehow this works!

#how would I get the new item names into the old items column of db1??
# I was expecting that this would work:
#    db1$olditems[db1$olditems=='']=
#        db2$newitems[db1$olditems=='']

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list