[R] non-intuitive behaviour after type conversion

Peter Ehlers ehlers at ucalgary.ca
Mon Nov 23 13:34:04 CET 2009


Alan Kelly wrote:
> Deal list,
> I have a data frame (birth) with mixed variables (numeric and 
> alphanumeric).  One variable "t1stvisit" was originally coded as numeric 
> with values 1,2, and 3.  After attaching the data frame, this  is what I 
> see when I use str(t1stvisit)
actually, str(birth), I suspect, but not important.
> 
> $ t1stvisit: int  1 1 1 1 1 1 1 1 2 2 ...
> 
> This is as expected.
> I then convert t1stvisit to a factor and to avoid creating a second copy 
> of this variable independent of the data frame I use:
> birth$t1stvisit = as.factor(birth$t1stvisit)
> if I check that the conversion has worked:
> is.factor(t1stvisit)
> [1] FALSE
> Now the only object present in the workspace in the data frame "birth" 
> and, as noted,  I have not created any new variables. So why does R 
> still treat t1stvisit as numeric?
> is.factor(t1stvisit)
> [1] FALSE
> 
> Yet when I try the following:
>  > is.factor(birth$t1stvisit)
> [1] TRUE
> So, there appears to be two versions of "t1stvisit"  - the original 
> numeric version and the correct factor version although ls() only shows 
> "birth" as present in the workspace.
> If I type:
>  > summary(t1stvisit)
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
>   1.000   1.000   2.000   1.574   2.000   3.000  29.000
> I get the numeric version, but if I try
> summary(birth$t1stvisit)
>    1    2    3 NA's
>  180  169   22   29
> I get the factor version.
> 
> Frankly I feel that this behaviour is non-intuitive and potentially 
> problematic. Nor have I seen warnings about this in the various text 
> books on R.
> Can anyone comment on why this should occur?

I haven't looked at discussions of 'attach()' for a while,
since I rarely use it nowadays (I find with() more convenient
most of the time), but Chapter 6 in 'An Introduction to R'
does discuss it.

There are indeed two versions of 'birth'.
Your basic problem is which version of 'birth' is being modified.
Hint: it's NOT the attached version.
Small example:

  dat <- data.frame(x=1:3)
  attach(dat)
  dat$y <- 4:6
  y
  #Error: object 'y' not found
  dat$y
  #[1] 4 5 6

BTW, you don't need as.factor(); use factor().

  -Peter Ehlers


> Many thanks,
> Alan Kelly
> 
> Dr. Alan Kelly
> Department of Public Health & Primary Care
> Trinity College Dublin
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
>




More information about the R-help mailing list