[R] imputing the numerical columns of a dataframe, returning the rest unchanged

Yihui Xie xieyihui at gmail.com
Wed Dec 24 06:46:24 CET 2008


Hi,

?sapply will tell you

....
     'sapply' is a user-friendly version of 'lapply' by default
     returning a vector or matrix if appropriate.
....

so 'x' has lost its class in sapply(); e.g.

## iris is a data.frame
> str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1
1 1 1 1 1 1 ...
## but sapply() will coerce it into a numeric matrix
> str(sapply(iris, function(x)x))
 num [1:150, 1:5] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:5] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" ...

I'd suggest you get the class of each column first, then apply
impute() to these columns (i.e. DF[, sapply(DF, class) == "numeric"])
and assign the new values to the original columns.

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
Mobile: +86-15810805877
Homepage: http://www.yihui.name
School of Statistics, Room 1037, Mingde Main Building,
Renmin University of China, Beijing, 100872, China



On Mon, Dec 22, 2008 at 11:38 PM, Mark Heckmann <mark.heckmann at gmx.de> wrote:
> Hi R-experts,
>
> how can I apply a function to each numeric column of a data frame and return
> the whole data frame with changes in numeric columns only?
> In my case I want to do a median imputation of the numeric columns and
> retain the other columns. My dataframe (DF) contains factors, characters and
> numerics.
>
> I tried the following but that does not work:
>
> foo <- function(x){
>  if(is.numeric(x)==TRUE) return(impute(x))
>  else(return(x))
> }
>
> sapply(DF, foo)
>
>      day version     ID     V1     V2  V3
>  [1,] "4" "A"       "1a"     "1"   "5"  "5"
>  [2,] "4" "A"       "2a"     "2"   "3"  "5"
>  [3,] "4" "B"       "3a"     "3"   "5"  "5"
>
> All the variables are coerced to characters now ("day" and "version" were
> factors, "id" a character). I only want imputations on the numerics, but the
> rest to be returned unchanged.
>
> Is there a function available. If not, how can I do it?
>
> TIA and merry x-mas,
> Mark
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list