[R] grouping function

arun smartpink111 at yahoo.com
Tue May 8 20:46:52 CEST 2012


HI Sarah,

I run the same code from your reply email.  For the makegroup2, the results are 0 in places of NA.

> makegroup1 <- function(x,y) {
+ group <- numeric(length(x))
+ group[x <= 1990 & y > 1990] <- 1
+ group[x <= 1991 & y > 1991] <- 2
+ group[x <= 1992 & y > 1992] <- 3
+ group
+ }
> makegroup2 <- function(x, y) {
+   ifelse(x <= 1990 & y > 1990, 1,
+       ifelse(x <= 1991 & y > 1991, 2,
+         ifelse(x <= 1992 & y > 1992, 3, 0)))
+ }
> makegroup1(df$begin,df$end)
 [1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
> makegroup2(df$begin,df$end)
 [1] 1 2 3 0 0 2 3 0 0 0 3 0 0 0 0


A. K.




----- Original Message -----
From: Sarah Goslee <sarah.goslee at gmail.com>
To: gps at asu.edu
Cc: "r-help at r-project.org" <r-help at r-project.org>
Sent: Tuesday, May 8, 2012 2:33 PM
Subject: Re: [R] grouping function

Hi,

On Tue, May 8, 2012 at 2:17 PM, Geoffrey Smith <gps at asu.edu> wrote:
> Hello, I would like to write a function that makes a grouping variable for
> some panel data .  The grouping variable is made conditional on the begin
> year and the end year.  Here is the code I have written so far.
>
> name <- c(rep('Frank',5), rep('Tony',5), rep('Edward',5));
> begin <- c(seq(1990,1994), seq(1991,1995), seq(1992,1996));
> end <- c(seq(1995,1999), seq(1995,1999), seq(1996,2000));
>
> df <- data.frame(name, begin, end);
> df;

Thanks for providing reproducible data. Two minor points: you don't
need ; at the end of lines, and calling your data frame df is
confusing because there's a df() function.

> #This is the part I am stuck on;
>
> makegroup <- function(x,y) {
>  group <- 0
>  if (x <= 1990 & y > 1990) {group==1}
>  if (x <= 1991 & y > 1991) {group==2}
>  if (x <= 1992 & y > 1992) {group==3}
>  return(x,y)
> }
>
> makegroup(df$begin,df$end);
>
> #I am looking for output where each observation belongs to a group
> conditional on the begin year and end year.  I would also like to use a for
> loop for programming accuracy as well;

This isn't a clear specification:
1990, 1994 for instance fits into all three groups. Do you want to
extend this to more start years, or are you only interested in those
three? Assuming end is always >= start, you don't even need to
consider the end years in your grouping.

Here are two methods, one that "looks like" your pseudocode, and one
that is more R-ish. They give different results because of different
handling of cases that fit all three groups. Rearranging the
statements in makegroup1() from broadest to most restrictive would
make it give the same result as makegroup2().


makegroup1 <- function(x,y) {
group <- numeric(length(x))
group[x <= 1990 & y > 1990] <- 1
group[x <= 1991 & y > 1991] <- 2
group[x <= 1992 & y > 1992] <- 3
group
}

makegroup2 <- function(x, y) {
   ifelse(x <= 1990 & y > 1990, 1,
      ifelse(x <= 1991 & y > 1991, 2,
          ifelse(x <= 1992 & y > 1992, 3, 0)))
}

> makegroup1(df$begin,df$end)
[1] 3 3 3 0 0 3 3 0 0 0 3 0 0 0 0
> makegroup2(df$begin,df$end)
[1]  1  2  3 NA NA  2  3 NA NA NA  3 NA NA NA NA
> df


But really, it's a better idea to develop an unambiguous statement of
your desired output.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list