[R] using complete.cases() with nested factors

hadley wickham h.wickham at gmail.com
Fri Sep 5 00:51:11 CEST 2008


On Thu, Sep 4, 2008 at 4:19 PM, Ken Knoblauch <ken.knoblauch at inserm.fr> wrote:
> Andrew Barr <wabarr <at> gmail.com> writes:
>> This maybe a newbie question.  I have a dataframe
> that looks like the sample
>> at the bottom of the email.  I have monthly
> precipitation data from several
>> sites over several years.  For each site,
> I need to extract years that have
>> a complete series of 12 monthly precipitation
> values, while excluding that
>> year for sites with incomplete data.
> I can't figure out how to do this
>> gracefully (i.e. without a silly for loop).
> Any help will be appreciate,
>> thanks!
>> SiteID    year    month    precip(mm)
>> 670090    1941    jan    2998
>> 670090    1941    feb    1299
>> 670090    1941    mar    1007
>> 670090    1941    apr    354
>> 670090    1941    may    88
>> 670090    1941    jun    156
>> 670090    1941    jul    8
>> 670090    1941    aug    4
>> 670090    1941    sep    8
>> 670090    1941    oct    58
>> 670090    1941    nov    397
>> 670090    1941    dec    248
>> 670090    1942    jan    NA
>> 670090    1942    feb    380
>> 670090    1942    mar    797
>> 670090    1942    apr    142
>> 670090    1942    may    43
>> 670090    1942    jun    14
>> 670090    1942    jul    70
>> 670090    1942    aug    51
>> 670090    1942    sep    0
>> 670090    1942    oct    10
>> 670090    1942    nov    235
>> 670090    1942    dec    405
>>
> There are likely more elegant solutions but this seems to work.
> If the data frame is in a variable named dd
>
> lapply(unique(dd$year), function(x) {s <- subset(dd, year == x)
>  if (nrow(s) == 12) s})

I think this is slightly more elegant, and follows the
split-apply-combine strategy:

years <- split(dd, dd$year)
full_years <- Filter(function(df) nrow(df) == 12, years)
do.call("cbind", full_years)

Hadley

-- 
http://had.co.nz/



More information about the R-help mailing list