[R] Replacing values

Marc Schwartz marc_schwartz at comcast.net
Mon Dec 18 21:55:45 CET 2006


On Mon, 2006-12-18 at 10:58 -0800, downunder wrote:
> Hi all,
> 
> I have to recode some values in a dataset. for example changing all zeros to
> "." or 999 would be also ok. does anybody know how to do this? thanks in
> advance. lars

R has its own missing value designator, which is NA.  A "." or "999"
would not be handled in a consistent fashion by most R functions,
whereas NA would be. As you will note below, "." would be rejected in
numerical operations.

For example (see ?mean):

> mean(c(1, 2, 3, 0))
[1] 1.5

> mean(c(1, 2, 3, NA))
[1] NA

> mean(c(1, 2, 3, NA), na.rm = TRUE)
[1] 2

> mean(c(1, 2, 3, .), na.rm = TRUE)
Error in mean(c(1, 2, 3, .), na.rm = TRUE) : 
	object "." not found

> mean(c(1, 2, 3, 999), na.rm = TRUE)
[1] 251.25


See ?NA and ?is.na and take note of the assignment usage in the latter.

To provide some examples:

1. Vector

> Vec <- sample(0:5, 10, replace = TRUE)
> Vec
 [1] 5 3 4 5 1 4 4 0 1 0

> is.na(Vec) <- Vec == 0
> Vec
 [1]  5  3  4  5  1  4  4 NA  1 NA


2. Matrix

> Mat <- matrix(sample(0:5, 20, replace = TRUE), ncol = 4)
> Mat
     [,1] [,2] [,3] [,4]
[1,]    4    4    1    4
[2,]    3    1    1    3
[3,]    3    0    1    0
[4,]    2    2    0    5
[5,]    4    0    5    1

> is.na(Mat) <- Mat == 0

> Mat
     [,1] [,2] [,3] [,4]
[1,]    4    4    1    4
[2,]    3    1    1    3
[3,]    3   NA    1   NA
[4,]    2    2   NA    5
[5,]    4   NA    5    1



3. Dataframe

> iris.tmp <- iris[1:10, ]
> iris.tmp
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa


> iris.tmp$Sepal.Length[sample(10, 3)] <- 0
> iris.tmp$Sepal.Width[sample(10, 3)] <- 0
> iris.tmp$Petal.Length[sample(10, 3)] <- 0
> iris.tmp$Petal.Width[sample(10, 3)] <- 0


> iris.tmp
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         0.0          0.0         0.2  setosa
2           4.9         0.0          1.4         0.2  setosa
3           4.7         0.0          1.3         0.0  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.0  setosa
6           5.4         3.9          0.0         0.0  setosa
7           0.0         3.4          1.4         0.3  setosa
8           0.0         3.4          0.0         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          0.0         3.1          1.5         0.1  setosa


> is.na(iris.tmp) <- iris.tmp == 0

> iris.tmp
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1          NA           NA         0.2  setosa
2           4.9          NA          1.4         0.2  setosa
3           4.7          NA          1.3          NA  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4          NA  setosa
6           5.4         3.9           NA          NA  setosa
7            NA         3.4          1.4         0.3  setosa
8            NA         3.4           NA         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10           NA         3.1          1.5         0.1  setosa


> summary(iris.tmp)
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width 
 Min.   :4.400   Min.   :2.900   Min.   :1.300   Min.   :0.1  
 1st Qu.:4.650   1st Qu.:3.100   1st Qu.:1.400   1st Qu.:0.2  
 Median :4.900   Median :3.400   Median :1.400   Median :0.2  
 Mean   :4.871   Mean   :3.343   Mean   :1.414   Mean   :0.2  
 3rd Qu.:5.050   3rd Qu.:3.500   3rd Qu.:1.450   3rd Qu.:0.2  
 Max.   :5.400   Max.   :3.900   Max.   :1.500   Max.   :0.3  
 NA's   :3.000   NA's   :3.000   NA's   :3.000   NA's   :3.0  
       Species  
 setosa    :10  
 versicolor: 0  
 virginica : 0  



If you want a more generic approach to replacing values based upon
logical conditions, there is also the replace() function:

> iris.tmp$Sepal.Length <- with(iris.tmp, 
                                replace(Sepal.Length, 
                                        Sepal.Length > 5.0, 999))

> iris.tmp
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1         999.0          NA           NA         0.2  setosa
2           4.9          NA          1.4         0.2  setosa
3           4.7          NA          1.3          NA  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4          NA  setosa
6         999.0         3.9           NA          NA  setosa
7            NA         3.4          1.4         0.3  setosa
8            NA         3.4           NA         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10           NA         3.1          1.5         0.1  setosa


See ?replace for more information and note that the assignment does not
happen "in place", you need to assign the result.

Finally, if you are reading in data sets from ASCII files using one of
the read.table() family of functions, take note of the 'na.strings'
argument, which will define the incoming values that you want to
explicitly set to missing (NA) during the import process.
See ?read.table for more information.

HTH,

Marc Schwartz



More information about the R-help mailing list