[R] by group problem

Mon Sep 3 10:51:13 CEST 2007

Hi

now I understand better what you want

topN.2 <- function(data,n=5) data[order(data[,3], decreasing=T),][1:n]

# I presume data is data frame with 3 columns and the third is percent

lapply(split(data,data$state), topN.2)

Regards

Petr

petr.pikal at precheza.cz

"Cory Nissen" <cnissen at AkoyaInc.com> napsal dne 31.08.2007 17:21:01:

> That didn't work for me...
> 
> Here's some data to help with a solution.
> 
> data <- NULL
> data$state <- c(rep("Illinois", 10), rep("Wisconsin", 10))
> data$county <- c("Adams", "Brown", "Bureau", "Cass", "Champaign", 
>                  "Christian", "Coles", "De Witt", "Douglas", "Edgar",
>                  "Adams", "Ashland", "Barron", "Bayfield", "Buffalo", 
>                  "Burnett", "Chippewa", "Clark", "Columbia", "Crawford")
> data$percentOld <- c(17.554849, 16.826594, 18.196593, 17.139242, 
8.743823,
>                      17.862746, 13.747967, 16.626302, 15.258940, 
18.984435,
>                      19.347022, 17.814436, 16.903067, 17.632781, 
16.659305,
>                      20.337817, 14.293354, 17.252820, 15.647179, 
16.825596)
> 
> return something like this...
> $Illinois
> "Edgar"
> 18.984435
> "Bureau"
> 18.196593
> ...
> $Wisconsin
> "Burnett"
> 20.33782
> "Adams"
> 19.34702
> ...
> 
> My Solution gives...
> topN <- function(column, n=5)
>   {
>     column <- sort(column, decreasing=T)
>     return(column[1:n])
>   }
> tapply(data$percentOld, data$state, topN)
> 
> $Illinois
> [1] 18.98444 18.19659 17.86275 17.55485 17.13924
> $Wisconsin
> [1] 20.33782 19.34702 17.81444 17.63278 17.25282
> 
> I get an error with this try...
> aggregate(data$percentOld, list(data$state, data$county), topN)
> 
> Error in aggregate.data.frame(as.data.frame(x), ...) : 
>  'FUN' must always return a scalar
> 
> Thanks
> 
> cn
> 
> 
> 
> From: Petr PIKAL [mailto:petr.pikal at precheza.cz]
> Sent: Fri 8/31/2007 8:15 AM
> To: Cory Nissen
> Cc: r-help at stat.math.ethz.ch
> Subject: Odp: [R] by group problem

> Hi
> 
> > I am working with census data.  My columns of interest are...
> >
> > PercentOld - the percentage of people in each county that are over 65
> > County - the county in each state
> > State - the state in the US
> >
> > There are about 3100 rows, with each row corresponding to a county
> within a state.
> >
> > I want to return the top five "PercentOld" by state.  But I want the
> County
> > and the Value.
> >
> > I tried this...
> >
> > topN <- function(column, n=5)
> >   {
> >     column <- sort(column, decreasing=T)
> >     return(column[1:n])
> >   }
> > top5PerState <- tapply(data$percentOld, data$STATE, topN)
> 
> Try
> 
> aggregate(data$PercentOld, list(data$State, data$County), topN)
> 
> Regards
> Petr
> 
> 
> >
> > But this only returns the value for "percentOld" per state, I also 
want
> the
> > corresponding County.
> >
> > I think I'm close, but I just can't get it...
> >
> > Thanks
> >
> > cn
> >
> >    [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.