[R] How to recode a factor level (within the list)?

Lauri Nikkinen lauri.nikkinen at iki.fi
Sun Dec 2 21:07:34 CET 2007


It seems I didn't get by with your previous solutions after all... I
would still need some more advice on the subject. I edited the DF so
that now all variables contain missing values (NAs).

y1 <- rnorm(10) + 6.8
y2 <- rnorm(10) + (1:10*1.7 + 1)
y3 <- rnorm(10) + (1:10*6.7 + 3.7)
y <- c(y1,y2,y3)
x <- rep(1:3,10)
f <- gl(2,15, labels=paste("lev", 1:2, sep=""))
g <- seq(as.Date("2000/1/1"), by="day", length=30)
DF <- data.frame(x=x,y=y, f=f, g=g)
DF$g[DF$x == 1] <- NA
DF$x[3:6] <- NA
DF$y[1] <- NA
DF$f[1] <- NA
DF$wdays <- weekdays(DF$g)

DF

Is it possible to create a function using the lapply function that
would produce those frequency data.frames into a list and replace all
NAs with the word "missing" in all these variables (which have a
different class)?

> sapply(DF, class)
          x           y           f           g       wdays
  "integer"   "numeric"    "factor"      "Date" "character"

Really appreciate the help,

Lauri


2007/12/2, Prof Brian Ripley <ripley at stats.ox.ac.uk>:
> On Sun, 2 Dec 2007, Lauri Nikkinen wrote:
>
> > #Dear R-users,
> > #I have a data.frame like this:
> >
> > y1 <- rnorm(10) + 6.8
> > y2 <- rnorm(10) + (1:10*1.7 + 1)
> > y3 <- rnorm(10) + (1:10*6.7 + 3.7)
> > y <- c(y1,y2,y3)
> > x <- rep(1:3,10)
> > f <- gl(2,15, labels=paste("lev", 1:2, sep=""))
> > g <- seq(as.Date("2000/1/1"), by="day", length=30)
> > DF <- data.frame(x=x,y=y, f=f, g=g)
> > DF$g[DF$x == 1] <- NA
> > DF$x[3:6] <- NA
> > DF$wdays <- weekdays(DF$g)
> >
> > DF
> >
> > #Frequences
> > g <- lapply(DF, function(x) as.data.frame(table(format(x))))
> > g
> >
> > #NA:s are now part of factor levels. How to recode NA:s into e.g. "missing"?
>
> Not so:
>
> > sapply(DF, class)
>           x           y           f           g       wdays
>   "integer"   "numeric"    "factor"      "Date" "character"
>
> and DF$f does not have any NA levels.
>
> The place you may think you have got NAs is in format(wdays):
> they are not NA nor "NA" but "NA     ".  I am not sure what exactly you
> want (NA is not appearing in the tables: see the 'exclude' argument),
> but perhaps
>
> lapply(DF, function(x) {
>   if(is.character(x)) x[is.na(x)] <- "missing"
>   as.data.frame(table(format(x)))
> })
>
> or
>
> lapply(DF, function(x) {
>   z <- table(format(x))
>   names(z)[grep("^NA", names(z))] <- "missing"
>   as.data.frame(z)
> })
>
> or
>
> lapply(DF, function(x) {
>   z <- table(x, exclude=character(0))
>   names(z)[is.na(names(z))] <- "missing"
>   as.data.frame(z)
> })
>
> ?
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>



More information about the R-help mailing list