[R] A file manipulation question

Thu Mar 4 04:46:27 CET 2004

On Wed, 2004-03-03 at 21:19, Greg Blevins wrote:
> Hello R experts,
> 
> The following problem outstrips my current programming knowledge. 
> 
> I have a dataframe with two fields that looks like the following:
> 
> ID     Contract
> 01     1
> 01     1
> 02     2
> 02     3
> 02     1
> 03     2
> 03     2
> 03     2
> 03     1
> 03     1
> 03     1
> etc...
> 
> I would like to end up with a dataframe with one row per ID where the
> value in the contract field would be the highest value recorded for a
> single ID. As you can see above, the number of IDs varies irregularly.
> Given the above, the new file would look like the following:
> 
> ID     Contract
> 01     1
> 02     3
> 03     2
> 
> Thanks in advance for your suggestions.

# Create the data frame
df <- data.frame(ID = I(c(rep("01", 2), rep("02", 3), rep("03", 6))),
                 Contract = c(1, 1, 2, 3, 1, 2, 2, 2, 1, 1, 1, ))

> df
   ID Contract
1  01        1
2  01        1
3  02        2
4  02        3
5  02        1
6  03        2
7  03        2
8  03        2
9  03        1
10 03        1
11 03        1

# Now use aggregate() to condense df by ID, using the max
# value of Contract
> aggregate(df$Contract, list(ID = df$ID), max)
  ID x
1 01 1
2 02 3
3 03 2

See ?aggregate for more information.  By default, aggregate() names the
function derived column as 'x'. You can of course rename it as you need.

HTH,

Marc Schwartz