[R] use subset to trim data but include last per category

Giovanni Azua bravegag at gmail.com
Sun Sep 9 17:13:50 CEST 2012


Hello,

I bumped into the following funny use-case. I have too much data for a given plot. I have the following data frame df: 

> str(df)
'data.frame':	5015 obs. of  5 variables:
 $ n          : Factor w/ 5 levels "1000","2000",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ iter       : int  10 20 30 40 50 60 70 80 90 100 ...
 $ Error      : num  1.05e-02 1.24e-03 3.67e-04 1.08e-04 4.05e-05 ...
 $ Duality_Gap: num  20080 3789 855 443 321 ...
 $ Runtime    : num  0.00536 0.01353 0.01462 0.01571 0.01681 ...

But if I plot e.g. Runtime vs log(Duality Gap) I have too many observations due to taking a snapshot every 10 iterations rather than say 500 and the plot looks very cluttered. So I would like to trim the data frame including only those records for which iter is multiple of 500 and so I do this:

df <- subset(df, iter %% 500 == 0)

This gives me almost exactly what I need except that the last and most important Duality Gap observations are of course gone due to the filtering ... I would like to change the subset clause to be iter %% 500 _or_ the record is the last per n (n is my problem size and category in this case) ... how can I do that?

I thought of adding a new column that flags whether a given row is the last element per category as "last" Boolean but this is a bit too complicated .. is there a simpler condition construct that can be used with the subset command?

TIA,
Best regards,
Giovanni    


More information about the R-help mailing list