[R] Drop firms in unbalanced panel if not more than 5 observations in consecutive years for all variables

Gabor Grothendieck ggrothendieck at gmail.com
Thu Jul 22 13:40:14 CEST 2010


On Thu, Jul 22, 2010 at 5:18 AM, Christian Schoder
<schoc152 at newschool.edu> wrote:
> Dear R-user,
>
> a few weeks ago I consulted the list-serve with a similar question.
> However, my task changed a little but sufficiently to get lost again. So
> I would appreciate any help on the following issue.
>
> I use the plm package and work with firm-level data in a panel. I would
> like to eliminate all firms that do not fulfill the requirement of
> having an observation in every variable used for at least x consecutive
> years.
>
> For illustration of the problem assume the following data set
>> data
>   id year  y  z
> 1   a 2000  1  1
> 2   b 2000 NA  2
> 3   b 2001  3  3
> 4   c 1999  1  1
> 5   c 2000  2  2
> 6   c 2001  4 NA
> 7   c 2002  5  4
> 8   d 1998  6  5
> 9   d 1999  5 NA
> 10  d 2000  6  6
> 11  d 2001  7  7
> 12  d 2002  3  6
> where id is the index of the firm, year the index for the year, and y
> and z are variables. Now, I would like to get rid of all firms with,
> let's say, less than 3 consecutive years in which there are observations
> for every variable. Hence, the procedure should yield
>> data.reduced
>   id year  y  z
> 1   d 1998  6  5
> 2   d 1999  5 NA
> 3   d 2000  6  6
> 4   d 2001  7  7
> 5   d 2002  3  6
>

Try this:

   do.call(rbind, by(DF, DF$id, function(x) if
(length(na.contiguous(x$y * x$z)) >= 3) x ))



More information about the R-help mailing list