[R] Drop firms in unbalanced panel if not more than 5 observations in consecutive years for all variables

Dimitris Rizopoulos d.rizopoulos at erasmusmc.nl
Thu Jul 22 13:34:38 CEST 2010


try this:

Dat <- read.table(textConnection(
"id year  y  z
1   a 2000  1  1
2   b 2000 NA  2
3   b 2001  3  3
4   c 1999  1  1
5   c 2000  2  2
6   c 2001  4 NA
7   c 2002  5  4
8   d 1998  6  5
9   d 1999  5 NA
10  d 2000  6  6
11  d 2001  7  7
12  d 2002  3  6"
), header = TRUE)
closeAllConnections()

n.years <- 3 # the threshold
na.ind <- !rowSums(is.na(Dat[-(1:2)])) # the variables of interest
ind <- ave(na.ind, Dat$id, FUN = function (x) any(cumsum(x) > n.years))
Dat[ind, ]


I hope it helps.

Best,
Dimitris


On 7/22/2010 11:18 AM, Christian Schoder wrote:
> Dear R-user,
>
> a few weeks ago I consulted the list-serve with a similar question.
> However, my task changed a little but sufficiently to get lost again. So
> I would appreciate any help on the following issue.
>
> I use the plm package and work with firm-level data in a panel. I would
> like to eliminate all firms that do not fulfill the requirement of
> having an observation in every variable used for at least x consecutive
> years.
>
> For illustration of the problem assume the following data set
>> data
>     id year  y  z
> 1   a 2000  1  1
> 2   b 2000 NA  2
> 3   b 2001  3  3
> 4   c 1999  1  1
> 5   c 2000  2  2
> 6   c 2001  4 NA
> 7   c 2002  5  4
> 8   d 1998  6  5
> 9   d 1999  5 NA
> 10  d 2000  6  6
> 11  d 2001  7  7
> 12  d 2002  3  6
> where id is the index of the firm, year the index for the year, and y
> and z are variables. Now, I would like to get rid of all firms with,
> let's say, less than 3 consecutive years in which there are observations
> for every variable. Hence, the procedure should yield
>> data.reduced
>     id year  y  z
> 1   d 1998  6  5
> 2   d 1999  5 NA
> 3   d 2000  6  6
> 4   d 2001  7  7
> 5   d 2002  3  6
>
> Thank you very much for any help!
>
> Cheers, Christian
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014



More information about the R-help mailing list