[R] Min of

David Winsemius dwinsemius at comcast.net
Tue Sep 13 20:54:15 CEST 2011


On Sep 13, 2011, at 1:17 PM, bradford wrote:

> With the help of Andrie on StackOverflow.com, I was able to learn  
> about
> ddply.  I have another question that is more trivial and cannot seem  
> to find
> help on IRC and do not want to bother Andrie again.

It's doubtful that he would have considered it a bother. Just post a  
question and anyone up for rep points could do it. I certainly haven't  
noticed that Andrie is slacking off despite his 14+K points.

>  I can't seem to figure
> out what to google for, so I thought I'd ask here.
>
> I have:
> library(plyr)
> df_diff <- ddply(df, .(SOURCE), summarize,
> TIME_DIFF=-unclass(diff(REQUEST_DATE)))
> df_diff
>  SOURCE TIME_DIFF
> 1      A      7.55
> 2      A      5.55
> 3      A      3.40
> 4      D     35.00
> 5      D    563.00
> 6      D     37.00
> 7      D     35.00
> 8      D    996.00
>
> ... with a lot more records.
>
> I want to essentially sort SOURCE asc, TIME_DIFF asc and output the  
> top 15
> lowest TIME_DIFFS for each SOURCE.  How do I do this?

You might (I say "might" in the absence of a reproducible example for  
testing) do this with ave:

df_diff[ with( df.diff, ave(TIME_DIFF, SOURCE , FUN= order) < 16), ]


>
> Also, what is the data type of df_diff called so that I can look  
> into it
> some more?

The second letter in a **ply call tells you. if it's a "d", then it  
returns a dataframe. First letter is input class, second is output.

>
> 	[[alternative HTML version deleted]]
>
> _____________________________________________

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list