[R] Mean or mode imputation fro missing values

francesca casalino francy.casalino at gmail.com
Tue Oct 11 17:49:44 CEST 2011


Yes thank you Gu…
I am just trying to do this as a rough step and will try other
imputation methods which are more appropriate later.
I am just learning R, and was trying to do the for loop and
f-statement by hand but something is going wrong…

This is what I have until now:

*****fake array:
age<- c(5,8,10,12,NA)
a<- factor(c("aa", "bb", NA, "cc", "cc"))
b<- c("banana", "apple", "pear", "grape", NA)
df_test <- data.frame(age=age, a=a, b=b)
df_test$b<- as.character(df_test$b)

for (var in 1:ncol(df_test)) {
	if (class(df_test$var)=="numeric") {
		df_test$var[is.na(df_test$var)] <- mean(df_test$var, na.rm = TRUE)
		} else if (class(df_test$var)=="character") {
		Mode(df_test$var[is.na(df_test$var)], na.rm = TRUE)
		}
}

Where 'Mode' is the function:

function (x, na.rm)
{
    xtab <- table(x)
    xmode <- names(which(xtab == max(xtab)))
    if (length(xmode) > 1)
        xmode <- ">1 mode"
    return(xmode)
}


It seems as it is just ignoring the statements though, without giving
any error…Does anybody have any idea what is going on?

Thank you very much for all the great help!
-f

2011/10/11 Weidong Gu <anopheles123 at gmail.com>:
> In your case, it may not be sensible to simply fill missing values by
> mean or mode as multiple imputation becomes the norm this day. For
> your specific question, na.roughfix in randomForest package would do
> the work.
>
> Weidong Gu
>
> On Tue, Oct 11, 2011 at 8:11 AM, francesca casalino
> <francy.casalino at gmail.com> wrote:
>> Dear R experts,
>>
>> I have a large database made up of mixed data types (numeric,
>> character, factor, ordinal factor) with missing values, and I am
>> looking for a package that would help me impute the missing values
>> using  either the mean if numerical or the mode if character/factor.
>>
>> I maybe could use replace like this:
>> df$var[is.na(df$var)] <- mean(df$var, na.rm = TRUE)
>> And go through all the many different variables of the datasets using
>> mean or mode for each, but I was wondering if there was a faster way,
>> or if a package existed to automate this (by doing 'mode' if it is a
>> factor or character or 'mean' if it is numeric)?
>>
>> I have tried the package "dprep" because I wanted to use the function
>> "ce.mimp", btu unfortunately it is not available anymore.
>>
>> Thank you for your help,
>> -francy
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list