[R] A file manipulation question

Marc Schwartz MSchwartz at medanalytics.com
Thu Mar 4 04:46:27 CET 2004


On Wed, 2004-03-03 at 21:19, Greg Blevins wrote:
> Hello R experts,
> 
> The following problem outstrips my current programming knowledge. 
> 
> I have a dataframe with two fields that looks like the following:
> 
> ID     Contract
> 01     1
> 01     1
> 02     2
> 02     3
> 02     1
> 03     2
> 03     2
> 03     2
> 03     1
> 03     1
> 03     1
> etc...
> 
> I would like to end up with a dataframe with one row per ID where the
> value in the contract field would be the highest value recorded for a
> single ID. As you can see above, the number of IDs varies irregularly.
> Given the above, the new file would look like the following:
> 
> ID     Contract
> 01     1
> 02     3
> 03     2
> 
> Thanks in advance for your suggestions.

# Create the data frame
df <- data.frame(ID = I(c(rep("01", 2), rep("02", 3), rep("03", 6))),
                 Contract = c(1, 1, 2, 3, 1, 2, 2, 2, 1, 1, 1, ))

> df
   ID Contract
1  01        1
2  01        1
3  02        2
4  02        3
5  02        1
6  03        2
7  03        2
8  03        2
9  03        1
10 03        1
11 03        1

# Now use aggregate() to condense df by ID, using the max
# value of Contract
> aggregate(df$Contract, list(ID = df$ID), max)
  ID x
1 01 1
2 02 3
3 03 2


See ?aggregate for more information.  By default, aggregate() names the
function derived column as 'x'. You can of course rename it as you need.

HTH,

Marc Schwartz




More information about the R-help mailing list