[R] Help finding first value in a BY group

William Dunlap wdunlap at tibco.com
Fri Mar 15 22:44:33 CET 2013


ddply() is very handy, but sometimes it seems like overkill
to select rows from a dataset by pulling into pieces, selecting
a row from each piece, the pasting the pieces back together
again.  Information like row names can be lost.

The following uses a subscript to pull out the rows of interest.
We compute the subcript with ave(), which does the same sort of
looping that things in plyr do, but it operates on an integer vector
rather than the whole data.frame.
  > w <- with(PeriodSKUForecast, ave(Period, SKU, FUN=order)) 
  > PeriodSKUForecast[w==1,]
    Period SKU Forecast
  1      1  A1       99
  4      1  X4       63
  9      1  K2      207
Note that the output rows are in the order they were in in the
input data.frame and their row names come from the input also.
If you want the first two periods for each SKU use the subscript w<=2. 

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Marc Schwartz
> Sent: Friday, March 15, 2013 11:57 AM
> To: arun
> Cc: R help
> Subject: Re: [R] Help finding first value in a BY group
> 
> Hi,
> 
> There is a potential gotcha with the approach of using head(..., 1) in each of the solutions
> that Arun has below, which is the assumption that the data is sorted, as is the case in the
> example data. It seems reasonable to consider that the real data at hand may not be
> entered in order or presorted.
> 
> If the data is not sorted (switching the order of the two K2 related entries):
> 
> Period <- c(1, 2, 3, 1, 2, 3, 4, 2, 1)
> Forecast <- c(99, 103, 128, 63, 69, 72, 75, 201, 207)
> SKU <- c("A1","A1","A1","X4","X4","X4","X4","K2","K2")
> 
> PeriodSKUForecast <- data.frame(Period, SKU, Forecast)
> 
> > PeriodSKUForecast
>   Period SKU Forecast
> 1      1  A1       99
> 2      2  A1      103
> 3      3  A1      128
> 4      1  X4       63
> 5      2  X4       69
> 6      3  X4       72
> 7      4  X4       75
> 8      2  K2      201
> 9      1  K2      207
> 
> 
> > with(PeriodSKUForecast,tapply(Forecast,SKU,head,1))
>  A1  K2  X4
>  99 201  63
> 
> > aggregate(Forecast~SKU,data=PeriodSKUForecast,head,1)
>   SKU Forecast
> 1  A1       99
> 2  K2      201
> 3  X4       63
> 
> 
> Note that the wrong value for K2 is returned.
> 
> You would either have to pre-sort the data frame before using these approaches:
> 
> NewDF <- PeriodSKUForecast[with(PeriodSKUForecast, order(SKU, Period)), ]
> 
> > NewDF
>   Period SKU Forecast
> 1      1  A1       99
> 2      2  A1      103
> 3      3  A1      128
> 9      1  K2      207
> 8      2  K2      201
> 4      1  X4       63
> 5      2  X4       69
> 6      3  X4       72
> 7      4  X4       75
> 
> > with(NewDF,tapply(Forecast,SKU,head,1))
>  A1  K2  X4
>  99 207  63
> 
> 
> Or consider an approach that does not depend upon the sort order, but which subsets
> based upon the minimum value of Period for each SKU:
> 
> do.call(rbind, lapply(split(PeriodSKUForecast, PeriodSKUForecast$SKU),
>                       function(x) x[which.min(x$Period), ]))
>    Period SKU Forecast
> A1      1  A1       99
> K2      1  K2      207
> X4      1  X4       63
> 
> or remove the Period column if you don't want it:
> 
> > do.call(rbind, lapply(split(PeriodSKUForecast, PeriodSKUForecast$SKU),
>                         function(x) x[which.min(x$Period), -1]))
>    SKU Forecast
> A1  A1       99
> K2  K2      207
> X4  X4       63
> 
> 
> 
> Regards,
> 
> Marc Schwartz
> 
> 
> On Mar 15, 2013, at 12:37 PM, arun <smartpink111 at yahoo.com> wrote:
> 
> > Hi,
> > Try:
> > data.frame(Forecast=with(PeriodSKUForecast,tapply(Forecast,SKU,head,1)))
> > #   Forecast
> > #A1       99
> > #K2      207
> > #X4       63
> >
> > #or
> >  aggregate(Forecast~SKU,data=PeriodSKUForecast,head,1)
> > #  SKU Forecast
> > #1  A1       99
> > #2  K2      207
> > #3  X4       63
> >
> > #or
> > library(plyr)
> > ddply(PeriodSKUForecast,.(SKU),summarise, Forecast=head(Forecast,1))
> > #  SKU Forecast
> > #1  A1       99
> > #2  K2      207
> > #3  X4       63
> > A.K.
> >
> >
> >
> >
> > ----- Original Message -----
> > From: Barry King <barry.king at qlx.com>
> > To: r-help at r-project.org
> > Cc:
> > Sent: Friday, March 15, 2013 1:30 PM
> > Subject: [R] Help finding first value in a BY group
> >
> > I have a large Excel file with SKU numbers (stock keeping units) and
> > forecasts which can be mimicked with the following:
> >
> > Period <- c(1, 2, 3, 1, 2, 3, 4, 1, 2)
> > SKU <- c("A1","A1","A1","X4","X4","X4","X4","K2","K2")
> > Forecast <- c(99, 103, 128, 63, 69, 72, 75, 207, 201)
> > PeriodSKUForecast <- data.frame(Period, SKU, Forecast)
> > PeriodSKUForecast
> >
> >   Period SKU Forecast
> > 1      1  A1       99
> > 2      2  A1      103
> > 3      3  A1      128
> > 4      1  X4       63
> > 5      2  X4       69
> > 6      3  X4       72
> > 7      4  X4       75
> > 8      1  K2      207
> > 9      2  K2      201
> >
> > I need to create a matrix with only the first forecast for each SKU:
> >
> > A1 99
> > X4 63
> > K2 207
> >
> > The Period for the first forecast will always be the minimum value
> > for an SKU.
> >
> > Can anyone suggest how I might accomplish this?
> >
> > Thank you,
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list