[R] how to replace NA with a specific score that is dependant on another indicator variable

David Winsemius dwinsemius at comcast.net
Wed Sep 1 15:55:27 CEST 2010


On Sep 1, 2010, at 9:20 AM, Chris Howden wrote:

> Hi everyone,
>
>
>
> I’m looking for a clever bit of code to replace NA’s with a specific  
> score
> depending on an indicator variable.
>
> I can see how to do it using lots of if statements but I’m sure  
> there most
> be a neater, better way of doing it.
>
> Any ideas at all will be much appreciated, I’m dreading coding up  
> all those
> if statements!!!!!
>
> My problem is as follows:
>
> I have a data set with lots of missing data:
>
> EG Raw Data Set
>
> Category             variable1             variable2              
> variable3
>
>      1                            5                            NA
> NA
>
>      1                           NA
> 3                              4
>
>      2                            NA
>       7                            NA

This does not do its work by category (since I got tired of fixing  
mangled htmlized datasets) but it seems to me that a tapply "wrap"  
could do either of these operations within categories:


 > egraw
   Category variable1 variable2 variable3
1        1         5        NA        NA
2        1        NA         3         4
3        2        NA         7        NA

 > lapply(egraw, function(x) {mnx <- mean(x, na.rm=TRUE)
                              sapply(x, function(z) if (is.na(z)) 
{mnx}else{z})
                             }
          )
$Category
[1] 1 1 2

$variable1
[1] 5 5 5

$variable2
[1] 5 3 7

$variable3
[1] 4 4 4

 > sapply(egraw, function(x) {mnx <- mean(x, na.rm=TRUE)
                              sapply(x, function(z) if (is.na(z)) 
{mnx}else{z})
                             }
               )
      Category variable1 variable2 variable3
[1,]        1         5         5         4
[2,]        1         5         3         4
[3,]        2         5         7         4

>
>    etc
>
> Now I want to replace the NA’s with the average for each category,  
> so if
> these averages were:
>
> EG Averages
>
> Category             variable1             variable2              
> variable3
>
>      1                           4.5
> 3.2                           2.5
>
>      2                           3.5
>       7.4                           5.9
>
>
>
> So I’d like my data set to look like the following once I’ve  
> replaced the
> NA’s with the appropriate category average:
>
> EG Imputed Data Set
>
> Category             variable1             variable2              
> variable3
>
>      1                            5                            3.2
> 2.5
>
>      1                           4.5
> 3                              4
>
>      2                           3.5
>     7                             5.9
>
>    etc
>
> Any ideas would be very much appreciated!!!!!

You might add reading the Posing Guide and setting up your reader to  
post in plain text to your TODO list.
>
> thankyou
>
> Chris Howden

> .

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list