[R] use subset to trim data but include last per category

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Sun Sep 9 17:59:26 CEST 2012


dfthin <- df[ c(which(iter %% 500 == 0),nrow(df) ]

or

 dfthin <- subset(df, (iter %% 500 == 0) | (seq.int(nrow(df)==nrow(df)))

N.B. You should avoid using the name "df" for your variables, because it is the name of a built-in function that you are hiding by doing so. Others may be confused, and eventually you may want to use that function yourself. One solution is to use DF for your variables... another is to use more descriptive names.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Giovanni Azua <bravegag at gmail.com> wrote:

>Hello,
>
>I bumped into the following funny use-case. I have too much data for a
>given plot. I have the following data frame df: 
>
>> str(df)
>'data.frame':	5015 obs. of  5 variables:
>$ n          : Factor w/ 5 levels "1000","2000",..: 1 1 1 1 1 1 1 1 1 1
>...
> $ iter       : int  10 20 30 40 50 60 70 80 90 100 ...
> $ Error      : num  1.05e-02 1.24e-03 3.67e-04 1.08e-04 4.05e-05 ...
> $ Duality_Gap: num  20080 3789 855 443 321 ...
> $ Runtime    : num  0.00536 0.01353 0.01462 0.01571 0.01681 ...
>
>But if I plot e.g. Runtime vs log(Duality Gap) I have too many
>observations due to taking a snapshot every 10 iterations rather than
>say 500 and the plot looks very cluttered. So I would like to trim the
>data frame including only those records for which iter is multiple of
>500 and so I do this:
>
>df <- subset(df, iter %% 500 == 0)
>
>This gives me almost exactly what I need except that the last and most
>important Duality Gap observations are of course gone due to the
>filtering ... I would like to change the subset clause to be iter %%
>500 _or_ the record is the last per n (n is my problem size and
>category in this case) ... how can I do that?
>
>I thought of adding a new column that flags whether a given row is the
>last element per category as "last" Boolean but this is a bit too
>complicated .. is there a simpler condition construct that can be used
>with the subset command?
>
>TIA,
>Best regards,
>Giovanni    
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list