David Winsemius dwinsemius at comcast.net
Sat Nov 24 18:35:50 CET 2012

On Nov 23, 2012, at 8:42 PM, Brian Feeny wrote:

> I am trying to make it so two columns with similar data use the same  
> internal numbers for same factors, here is the example:
>> read.csv("test.csv",header =FALSE,sep=",")
>     V1    V2       V3
> 1   sun  moon    stars
> 2 stars  moon      sun
> 3   cat   dog   catdog
> 4   dog  moon      sun
> 5  bird plane superman
> 6  1000   dog     2000
>> data <- read.csv("test.csv",header =FALSE,sep=",")
>> str(data)
> 'data.frame':	6 obs. of  3 variables:
> $ V1: Factor w/ 6 levels "1000","bird",..: 6 5 3 4 2 1
> $ V2: Factor w/ 3 levels "dog","moon","plane": 2 2 1 2 3 1
> $ V3: Factor w/ 5 levels "2000","catdog",..: 3 4 2 4 5 1
>> as.numeric(data$V1)
> [1] 6 5 3 4 2 1
>> as.numeric(data$V2)
> [1] 2 2 1 2 3 1
>> as.factor(data$V1)
> [1] sun   stars cat   dog   bird  1000
> Levels: 1000 bird cat dog stars sun
>> as.factor(data$V2)
> [1] moon  moon  dog   moon  plane dog
> Levels: dog moon plane
> So notice "dog" is 4 in V1, yet its 1 in V2.  Is there a way, either  
> on import, or after, to have factors computed for both columns and  
> assigned
> the same internal values?

 > dat[] <- lapply(dat, function(x) factor(as.character(x),
levels(unlist(dat)) ) )
 > dat
      V1    V2       V3
1   sun  moon    stars
2 stars  moon      sun
3   cat   dog   catdog
4   dog  moon      sun
5  bird plane superman
6  1000   dog     2000
 > levels(dat[[1]])
  [1] "1000"     "bird"     "cat"      "dog"      "stars"    "sun"
  [7] "moon"     "plane"    "2000"     "catdog"   "superman"

I see your "clarification". Reordering the representation can be done  
with :

levels(dat) <- <character vector>


David Winsemius, MD
Alameda, CA, USA

