[R] Building factors across two columns, is this possible?

Rui Barradas ruipbarradas at sapo.pt
Sat Nov 24 13:57:07 CET 2012


Hello,

You can do what you want, but the coding of factors starts at 1 not at 0.


dat <- read.table(text="
V1    V2       V3
1   sun  moon    stars
2 stars  moon      sun
3   cat   dog   catdog
4   dog  moon      sun
5  bird plane superman
6  1000   dog     2000
", header = TRUE)

levs <- unique(unlist(dat))

dat$V1 <- factor(dat$V1, levels = levs)
dat$V2 <- factor(dat$V2, levels = levs)
dat$V3 <- factor(dat$V3, levels = levs)

str(dat)
'data.frame':   6 obs. of  3 variables:
  $ V1: Factor w/ 11 levels "sun","stars",..: 1 2 3 4 5 6
  $ V2: Factor w/ 11 levels "sun","stars",..: 7 7 4 7 8 4
  $ V3: Factor w/ 11 levels "sun","stars",..: 2 1 9 1 10 11


Hope this helps,

Rui Barradas
Em 24-11-2012 07:33, Brian Feeny escreveu:
> To clarify on my previous post, here is a representation of what I am trying to accomplish:
>
> I would like every unique value in either column to be assigned a number so like so:
>
>      V1    V2       V3
> 1   sun  moon    stars
> 2 stars  moon      sun
> 3   cat   dog   catdog
> 4   dog  moon      sun
> 5  bird plane superman
> 6  1000   dog     2000
>
> Level			Value
> sun			->	0
> stars		->	1
> cat			->	2
> dog			->	3
> bird			->	4
> 1000		->	5
> moon		->	6
> plane		->	7
> catdog		->	8
> superman	->	9
> 2000		->   10
> etc
> etc
>
> so internally its represented as:
>
>      V1    V2       V3
> 1   0		6	1
> 2   1		6	0
> 3   2		3	8
> 4   3		6	0
> 5   4		7	9
> 6   5		3	10
>
> does this make sense?  I am hoping there is a way to accomplish this.
>
> Brian
>
> On Nov 23, 2012, at 11:42 PM, Brian Feeny <bfeeny at mac.com> wrote:
>
>> I am trying to make it so two columns with similar data use the same internal numbers for same factors, here is the example:
>>
>>> read.csv("test.csv",header =FALSE,sep=",")
>>      V1    V2       V3
>> 1   sun  moon    stars
>> 2 stars  moon      sun
>> 3   cat   dog   catdog
>> 4   dog  moon      sun
>> 5  bird plane superman
>> 6  1000   dog     2000
>>> data <- read.csv("test.csv",header =FALSE,sep=",")
>>> str(data)
>> 'data.frame':	6 obs. of  3 variables:
>> $ V1: Factor w/ 6 levels "1000","bird",..: 6 5 3 4 2 1
>> $ V2: Factor w/ 3 levels "dog","moon","plane": 2 2 1 2 3 1
>> $ V3: Factor w/ 5 levels "2000","catdog",..: 3 4 2 4 5 1
>>
>>> as.numeric(data$V1)
>> [1] 6 5 3 4 2 1
>>> as.numeric(data$V2)
>> [1] 2 2 1 2 3 1
>>> as.factor(data$V1)
>> [1] sun   stars cat   dog   bird  1000
>> Levels: 1000 bird cat dog stars sun
>>> as.factor(data$V2)
>> [1] moon  moon  dog   moon  plane dog
>> Levels: dog moon plane
>>
>>
>> So notice "dog" is 4 in V1, yet its 1 in V2.  Is there a way, either on import, or after, to have factors computed for both columns and assigned
>> the same internal values?
>>
>> Brian
>>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list