[R] evaluating NAs in a dataframe

David Winsemius dwinsemius at comcast.net
Wed Dec 8 22:19:33 CET 2010


On Dec 8, 2010, at 3:10 PM, Wade Wall wrote:

> Hi all,
>
> How can one evaluate NAs in a numeric dataframe column?  For  
> example, I have
> a dataframe (demo) with a column of numbers and several NAs. If I  
> write
> demo.df >= 10, numerals will return TRUE or FALSE, but if the value is
> "NA", "NA" is returned.  But if I write demo.df == "NA", it returns  
> as "NA"
> also.  I know that I can remove NAs, but would like to keep the  
> dataframe as
> is without creating a subset.  I basically want to add a line that  
> evaluates
> the NA in the demo dataframe.

That looks really, really painful. Why not use the function  
findInterval and then do a lookup in a character vector. Then you can  
throw away that loopy construct completely.

 > demo  <- data.frame(Area = runif(10, 0, 100))
 > demo$catarea <- findInterval(demo$Area, c(0,25,50,75,100))
 > demo
         Area catarea
1  71.440401       3
2   8.438097       1
3  45.492178       2
4  50.669996       3
5  15.444114       1
6  33.954948       2
7  19.738747       1
8  56.485654       3
9  29.218921       2
10 74.204611       3
 > demo$catname <- c("S01","S02", "S03","S04")[demo$catarea]
 > demo
         Area catarea catname
1  71.440401       3     S03
2   8.438097       1     S01
3  45.492178       2     S02
4  50.669996       3     S03
5  15.444114       1     S01
6  33.954948       2     S02
7  19.738747       1     S01
8  56.485654       3     S03
9  29.218921       2     S02
10 74.204611       3     S03

-- 
David.
>
> As an example, I want to assign rows to classes based on values in
> demo$Area. Some of the values in demo$Area are "NA"
>
> for (i in 1:nrow(demo)) {
>  if (demo$Area[i] > 0 && demo$Area[i] < 10) {Class[i]<-"S01"} ##  
> 1-10 cm2
>  if (demo$Area[i] >= 10 && demo$Area[i] < 25) {Class[i] <- "S02"} ##
> 10-25cm2
>  if (demo$Area[i] >= 25 && demo$Area[i] < 50) {Class[i] <-"S03"} ##  
> 25-50
> cm2
>  if (demo$Area[i] >= 50 && demo$Area[i] < 100) {Class[i] <-"S04"} ##  
> 50-100
> cm2
>  if (demo$Area[i] >= 100 && demo$Area[i] < 200) {Class[i] <- "S05"} ##
> 100-200 cm2
>  if (demo$Area[i] >= 200 && demo$Area[i] < 400) {Class[i] <- "S06"} ##
> 200-400 cm2
>  if (demo$Area[i] >= 400 && demo$Area[i] < 800) {Class[i] <- "S07"} ##
> 400-800 cm2
>  if (demo$Area[i] >= 800 && demo$Area[i] < 1600) {Class[i] <- "S08"}  
> ##
> 800-1600 cm2
>  if (demo$Area[i] >= 1600 && demo$Area[i] < 3200) {Class[i] <-  
> "S09"} ##
> 1600-3200 cm2
>  if (demo$Area[i] >=3200) {Class[i] <- "S10"} ## >3200 cm2
>  }
>
> What happens is that I get the message "Error in if (demo$Area[i] >  
> 0 &&
> demo$Area[i] < 10) { : missing value where TRUE/FALSE needed"
>
> Thanks for any help
>
> Wade
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list