[R] Reference factors inside split

Naresh Gurbuxani n@re@h_gurbux@n| @end|ng |rom hotm@||@com
Mon Jul 11 16:33:31 CEST 2022


This is what I was looking for.  Thanks for your quick response and elegant solution.

Naresh

Sent from my iPhone

> On Jul 11, 2022, at 10:00 AM, Ben Tupper <btupper using bigelow.org> wrote:
> 
> Hi,
> 
> The grouping variable is removed from the subgroups when you split.
> Instead of iterating over the elements of the split list, you can
> iterate over the **names** of the elements.  In your case the account
> name is the grouping variable.
> 
> 
> ##start
> 
> library(lattice)
> mydf <- data.frame(
>  date = rep(seq.Date(from = as.Date("2022-06-01"), by = 1, length.out =
>                        10), 4),
>  account = c(rep("ABC", 20), rep("XYZ", 20)),
>  client = c(rep("P", 10), rep("Q", 10), rep("R", 10), rep("S", 10)),
>  profit = round(runif(40, 2, 5), 2), sale = round(runif(40, 10, 20), 2))
> 
> account.names <- data.frame(account = c("ABC", "DEF", "XYZ"),
>                            corp = c("ABC Corporation", "DEF LLC",
> "XYZ Incorporated"))
> 
> mydf.split <- split(mydf, mydf$account)
> 
> myplots <- sapply(names(mydf.split),
>  function(name, x = NULL) {
>    df <- x[[name]]
>    myts <- aggregate(sale ~ date, FUN = sum, data = df)
>    xyplot(sale ~ date, data = myts, main = name)
>  }, x = mydf.split, USE.NAMES = TRUE, simplify = FALSE)
> 
> myplots[["ABC"]]
> myplots[["XYZ"]]
> 
> ## end
> 
> Does that help?
> 
>> On Mon, Jul 11, 2022 at 9:14 AM Naresh Gurbuxani
>> <naresh_gurbuxani using hotmail.com> wrote:
>> 
>> 
>> I want to split my dataframe according to a list of factors.  Then, in
>> the resulting list, I want to reference the factors used in split.  Is
>> it possible?
>> 
>> Thanks,
>> Naresh
>> 
>> mydf <- data.frame(
>> date = rep(seq.Date(from = as.Date("2022-06-01"), by = 1, length.out =
>> 10), 4),
>> account = c(rep("ABC", 20), rep("XYZ", 20)),
>> client = c(rep("P", 10), rep("Q", 10), rep("R", 10), rep("S", 10)),
>> profit = round(runif(40, 2, 5), 2), sale = round(runif(40, 10, 20), 2))
>> 
>> account.names <- data.frame(account = c("ABC", "DEF", "XYZ"),
>> corp = c("ABC Corporation", "DEF LLC", "XYZ Incorporated"))
>> 
>> mydf.split <- split(mydf, mydf$account)
>> 
>> # This does not work
>> myplots <- lapply(mydf.split, function(df) {
>> myts <- aggregate(sales ~ date, FUN = sum, data = df)
>> xyplot(sales ~ date, data = myts, main = account)})
>> 
>> # This works, but may have a large overhead
>> mydf <- merge(mydf, account.names, by = "account", all.x = TRUE)
>> mydf.split <- split(mydf, mydf$account)
>> myplots <- lapply(mydf.split, function(df) {
>> myts <- aggregate(sale ~ date, FUN = sum, data = df)
>> xyplot(sale ~ date, data = myts, main = unique(myts$corp))})
>> 
>> # Now I can print one plot at a time
>> myplots[["ABC"]]
>> myplots[["XYZ"]]
>> 
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> Ben Tupper (he/him)
> Bigelow Laboratory for Ocean Science
> East Boothbay, Maine
> http://www.bigelow.org/
> https://eco.bigelow.org


More information about the R-help mailing list