[R] Losing factor levels when moving variables from one context to another

Thu Feb 1 18:51:10 CET 2007

On Thu, 2007-02-01 at 12:13 -0500, Michael Rennie wrote:
> Hi, there
> 
> I'm currently trying to figure out how to keep my "factor" levels for a 
> variable when moving it from one data frame or matrix to another.
> 
> Example below:
> 
> vec1<-(rep("10",5))
> vec2<-(rep("30",5))
> vec3<-(rep("80",5))
> vecs<-c(vec1, vec2, vec3)
> 
> resp<-rnorm(2,15)
> 
> dat<-as.data.frame(cbind(resp, vecs))
> dat$vecs<-factor(dat$vecs)
> dat
> 
> R returns:
>                    resp  vecs
> 1     1.57606068767956   10
> 2     2.30271782269308   10
> 3     2.39874788444542   10
> 4    0.963987738423353   10
> 5     2.03620782454740   10
> 6  -0.0706713324725649   30
> 7     1.49001721222926   30
> 8     2.00587718501980   30
> 9    0.450576585429981   30
> 10    2.87120375367357   30
> 11    2.25575058079324   80
> 12    2.03471288724508   80
> 13    2.67432066972984   80
> 14    1.74102136279177   80
> 15    2.29827581276955   80
> 
> and now:
> 
> newvar<-(rnorm(15,4))
> newdat<-as.data.frame(cbind(newvar, dat$vecs))
> newdat
> 
> R returns:
> 
>        newvar V2
> 1  4.300788  1
> 2  5.295951  1
> 3  5.099849  1
> 4  3.211045  1
> 5  3.703554  1
> 6  3.693826  2
> 7  5.314679  2
> 8  4.222270  2
> 9  3.534515  2
> 10 4.037401  2
> 11 4.476808  3
> 12 4.842449  3
> 13 3.109677  3
> 14 4.752961  3
> 15 4.445216  3
>  >
> 
> I seem to have lost everything I once has associated with "vecs", and it's 
> turned my actual values into arbitrary groupings.
> 
> I assume this has something to do with the behaviour of factors? Does 
> anyone have any suggestions on how to get my original levels, etc., back?
> 
> Cheers,
> 
> Mike

Mike,

The problem (specific to your example) is that you are using
as.data.frame() and cbind(), which will first coerce the columns to a
common data type, create a matrix and then coerce the matrix to a
dataframe.

Thus, in the second case, your factor dat$vecs is first being coerced to
its numeric equivalent values, rather then being retained as a factor,
since a matrix can contain only one data type and the first column is
numeric.

Try this instead:

vec1<-(rep("10", 5))
vec2<-(rep("30", 5))
vec3<-(rep("80", 5))
vecs<-c(vec1, vec2, vec3)

set.seed(1)
resp<-rnorm(15, 2)

dat <- data.frame(resp, vecs)

> str(dat)
'data.frame':	15 obs. of  2 variables:
 $ resp: num  1.37 2.18 1.16 3.60 2.33 ...
 $ vecs: Factor w/ 3 levels "10","30","80": 1 1 1 1 1 2 2 2 2 2 ..

set.seed(2)
newvar <- rnorm(15, 4)
newdat <- data.frame(newvar, dat$vecs)

> str(newdat)
'data.frame':	15 obs. of  2 variables:
 $ newvar  : num  3.10 4.18 5.59 2.87 3.92 ...
 $ dat.vecs: Factor w/ 3 levels "10","30","80": 1 1 1 1 1 2 2 2 2 2 ...

> all(levels(newdat$dat.vecs) == levels(dat$vecs))
[1] TRUE

BTW, there may very well be times when you are combining two factors
together and need to ensure that the factor levels either are
intentionally different or need to "relevel" the combined factors into
common levels. See the Warning and other information in ?factor. This
would be critical, for example, if you are combining data sets to then
run modeling functions on the combined data sets.

HTH,

Marc Schwartz