[R] grouping function

Sarah Goslee sarah.goslee at gmail.com
Tue May 8 20:33:21 CEST 2012


Hi,

On Tue, May 8, 2012 at 2:17 PM, Geoffrey Smith <gps at asu.edu> wrote:
> Hello, I would like to write a function that makes a grouping variable for
> some panel data .  The grouping variable is made conditional on the begin
> year and the end year.  Here is the code I have written so far.
>
> name <- c(rep('Frank',5), rep('Tony',5), rep('Edward',5));
> begin <- c(seq(1990,1994), seq(1991,1995), seq(1992,1996));
> end <- c(seq(1995,1999), seq(1995,1999), seq(1996,2000));
>
> df <- data.frame(name, begin, end);
> df;

Thanks for providing reproducible data. Two minor points: you don't
need ; at the end of lines, and calling your data frame df is
confusing because there's a df() function.

> #This is the part I am stuck on;
>
> makegroup <- function(x,y) {
>  group <- 0
>  if (x <= 1990 & y > 1990) {group==1}
>  if (x <= 1991 & y > 1991) {group==2}
>  if (x <= 1992 & y > 1992) {group==3}
>  return(x,y)
> }
>
> makegroup(df$begin,df$end);
>
> #I am looking for output where each observation belongs to a group
> conditional on the begin year and end year.  I would also like to use a for
> loop for programming accuracy as well;

This isn't a clear specification:
1990, 1994 for instance fits into all three groups. Do you want to
extend this to more start years, or are you only interested in those
three? Assuming end is always >= start, you don't even need to
consider the end years in your grouping.

Here are two methods, one that "looks like" your pseudocode, and one
that is more R-ish. They give different results because of different
handling of cases that fit all three groups. Rearranging the
statements in makegroup1() from broadest to most restrictive would
make it give the same result as makegroup2().


makegroup1 <- function(x,y) {
 group <- numeric(length(x))
 group[x <= 1990 & y > 1990] <- 1
 group[x <= 1991 & y > 1991] <- 2
 group[x <= 1992 & y > 1992] <- 3
 group
}

makegroup2 <- function(x, y) {
   ifelse(x <= 1990 & y > 1990, 1,
      ifelse(x <= 1991 & y > 1991, 2,
   	   ifelse(x <= 1992 & y > 1992, 3, 0)))
}

> makegroup1(df$begin,df$end)
 [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
> makegroup2(df$begin,df$end)
 [1]  1  2  3 NA NA  2  3 NA NA NA  3 NA NA NA NA
> df


But really, it's a better idea to develop an unambiguous statement of
your desired output.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list