[R] how to replace NA with a specific score that is dependant on another indicator variable

David Winsemius dwinsemius at comcast.net
Wed Sep 1 16:19:05 CEST 2010


On Sep 1, 2010, at 9:55 AM, David Winsemius wrote:

>
> On Sep 1, 2010, at 9:20 AM, Chris Howden wrote:
>
>> Hi everyone,
>>
>>
>>
>> I’m looking for a clever bit of code to replace NA’s with a  
>> specific score
>> depending on an indicator variable.
>>
>> I can see how to do it using lots of if statements but I’m sure  
>> there most
>> be a neater, better way of doing it.
>>
>> Any ideas at all will be much appreciated, I’m dreading coding up  
>> all those
>> if statements!!!!!
>>
>> My problem is as follows:
>>
>> I have a data set with lots of missing data:
>>
>> EG Raw Data Set
>>
>> Category             variable1             variable2              
>> variable3
>>
>>     1                            5                            NA
>> NA
>>
>>     1                           NA
>> 3                              4
>>
>>     2                            NA
>>      7                            NA
>
> This does not do its work by category (since I got tired of fixing  
> mangled htmlized datasets) but it seems to me that a tapply "wrap"  
> could do either of these operations within categories:

Why not try out Hadley's plyr package?

require(plyr)
   ddply(egraw2, .(category), .fun=function(df) {
                 sapply(df[-1],
                      function(x) {mnx <- mean(x, na.rm=TRUE);
                                   sapply(x, function(z) if (is.na(z)) 
{mnx}else{z})
                                  }
                         )                      }          )

Tested on
egraw2 <- data.frame(category=rep(1:4, 4),
                     var1=sample(c(1:3, NA,NA), 16, replace =TRUE),
                     var2=sample(c(5:10, NA,NA), 16, replace =TRUE),
                    var3=sample(c(15:20, NA,NA), 16, replace =TRUE) )

-- 
David.
>
>
> > egraw
>  Category variable1 variable2 variable3
> 1        1         5        NA        NA
> 2        1        NA         3         4
> 3        2        NA         7        NA
>
> > lapply(egraw, function(x) {mnx <- mean(x, na.rm=TRUE)
>                             sapply(x, function(z) if (is.na(z)) 
> {mnx}else{z})
>                            }
>         )
> $Category
> [1] 1 1 2
>
> $variable1
> [1] 5 5 5
>
> $variable2
> [1] 5 3 7
>
> $variable3
> [1] 4 4 4
>
> > sapply(egraw, function(x) {mnx <- mean(x, na.rm=TRUE)
>                             sapply(x, function(z) if (is.na(z)) 
> {mnx}else{z})
>                            }
>              )
>     Category variable1 variable2 variable3
> [1,]        1         5         5         4
> [2,]        1         5         3         4
> [3,]        2         5         7         4
>
>>
>>   etc
>>
>> Now I want to replace the NA’s with the average for each category,  
>> so if
>> these averages were:
>>
>> EG Averages
>>
>> Category             variable1             variable2              
>> variable3
>>
>>     1                           4.5
>> 3.2                           2.5
>>
>>     2                           3.5
>>      7.4                           5.9
>>
>>
>>
>> So I’d like my data set to look like the following once I’ve  
>> replaced the
>> NA’s with the appropriate category average:
>>
>> EG Imputed Data Set
>>
>> Category             variable1             variable2              
>> variable3
>>
>>     1                            5                            3.2
>> 2.5
>>
>>     1                           4.5
>> 3                              4
>>
>>     2                           3.5
>>    7                             5.9
>>
>>   etc
>>
>> Any ideas would be very much appreciated!!!!!
>
> You might add reading the Posing Guide and setting up your reader to  
> post in plain text to your TODO list.
>>
>> thankyou
>>
>> Chris Howden
>
>> .
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list