[R] non-intuitive behaviour after type conversion

Alan Kelly akelly at tcd.ie
Mon Nov 23 09:54:19 CET 2009


Deal list,
I have a data frame (birth) with mixed variables (numeric and  
alphanumeric).  One variable "t1stvisit" was originally coded as  
numeric with values 1,2, and 3.  After attaching the data frame, this   
is what I see when I use str(t1stvisit)

$ t1stvisit: int  1 1 1 1 1 1 1 1 2 2 ...

This is as expected.
I then convert t1stvisit to a factor and to avoid creating a second  
copy of this variable independent of the data frame I use:
birth$t1stvisit = as.factor(birth$t1stvisit)
if I check that the conversion has worked:
is.factor(t1stvisit)
[1] FALSE
Now the only object present in the workspace in the data frame "birth"  
and, as noted,  I have not created any new variables. So why does R  
still treat t1stvisit as numeric?
is.factor(t1stvisit)
[1] FALSE

Yet when I try the following:
 > is.factor(birth$t1stvisit)
[1] TRUE
So, there appears to be two versions of "t1stvisit"  - the original  
numeric version and the correct factor version although ls() only  
shows "birth" as present in the workspace.
If I type:
 > summary(t1stvisit)
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
   1.000   1.000   2.000   1.574   2.000   3.000  29.000
I get the numeric version, but if I try
summary(birth$t1stvisit)
    1    2    3 NA's
  180  169   22   29
I get the factor version.

Frankly I feel that this behaviour is non-intuitive and potentially  
problematic. Nor have I seen warnings about this in the various text  
books on R.
Can anyone comment on why this should occur?
Many thanks,
Alan Kelly

Dr. Alan Kelly
Department of Public Health & Primary Care
Trinity College Dublin




More information about the R-help mailing list