[R] Assigning cases to groupings based on the values of several variables

Duncan Murdoch murdoch.duncan at gmail.com
Fri Dec 7 13:54:16 CET 2012


On 12-12-07 7:27 AM, Dimitri Liakhovitski wrote:
> Dear R-ers,
>
> my task is to simple: to assign cases to desired groupings based on the
> combined values on 2 variables. I can think of 3 methods of doing it.
> Method 1 seems to me pretty r-like, but it requires a lot of lines of code
> - onerous.

Since your groups are so regular, you can compute the groups directly. 
Convert each column to a factor (this might have happened automatically, 
depending on your data and options), then use as.integer to convert to a 
numeric value.

So a simple solution would be

mydata$mygroup.m4 <- with(mydata,
                              4*(2-as.integer(factor(sex)))
                              + as.integer(factor(age)))

It would be a little simpler if you wanted the sex factor in alphbetical 
order; then you wouldn't need to subtract from 2.

If your real data wasn't so regular, another approach would be to set up 
a matrix, indexed by sex and age, that gives the desired group number. 
That is somewhat like your "groupings" solution; I'm not sure it would 
be preferable to what you did.

Duncan Murdoch

> Method 2 is a loop, so not very good - as it loops through all rows of
> mydata.
> Method 3 is a loop but loops through fewer lines, so it seems to me more
> efficient.
> Can you please tell me:
> 1. Which of my methods is more efficient?
> 2. Is there maybe an even more efficient r-like way of doing it?
> Imagine - "mydata" is actually a very tall data frame.
> Thanks a lot!
> Dimitri
>
> ### My Data:
> mydata<-data.frame(sex=rep(c(rep("m",4),rep("f",4)),2),age=rep(c(1:4,1:4),2))
> (mydata)
>
> ### My desired assignments (in column "mygroup")
> groupings<-data.frame(sex=c(rep("m",4),rep("f",4)),age=c(1:4,1:4),mygroup=1:8)
> (groupings)
>
> # No, I don't need a solution where the last column of "groupings" is
> stacked twice and bound to "mydata"
>
> # Method 1 of assigning to groups - requires a lot of lines of code:
> mydata$mygroup.m1<-NA
> mydata[(mydata$sex %in% "m")&(mydata$age %in% 1),"mygroup.m1"]<-1
> mydata[(mydata$sex %in% "m")&(mydata$age %in% 2),"mygroup.m1"]<-2
> mydata[(mydata$sex %in% "m")&(mydata$age %in% 3),"mygroup.m1"]<-3
> mydata[(mydata$sex %in% "m")&(mydata$age %in% 4),"mygroup.m1"]<-4
> mydata[(mydata$sex %in% "f")&(mydata$age %in% 1),"mygroup.m1"]<-5
> mydata[(mydata$sex %in% "f")&(mydata$age %in% 2),"mygroup.m1"]<-6
> mydata[(mydata$sex %in% "f")&(mydata$age %in% 3),"mygroup.m1"]<-7
> mydata[(mydata$sex %in% "f")&(mydata$age %in% 4),"mygroup.m1"]<-8
> (mydata)
>
> # Method 2 of assigning to groups - very "loopy":
> mydata$mygroup.m2<-NA
> for(i in 1:nrow(mydata)){  # i<-1
>    mysex<-mydata[i,"sex"]
>    myage<-mydata[i,"age"]
>    mydata[i,"mygroup.m2"]<-groupings[(groupings$sex %in%
> mysex)&(groupings$age %in% myage),"mygroup"]
> }
> (mydata)
>
> # Method 3 of assigning to groups - also "loopy", but less than Method 2:
> mydata$mygroup.m3<-NA
> for(i in 1:nrow(groupings)){  # i<-1
>    mysex<-groupings[i,"sex"]
>    myage<-groupings[i,"age"]
>    mydata[(mydata$sex %in% mysex)&(mydata$age %in%
> myage),"mygroup.m3"]<-groupings[i,"mygroup"]
> }
> (mydata)
>




More information about the R-help mailing list