[R] "apply" a function that takes two or more vectors as arguments, such as cor(x, y), over a "category" or "grouping variable" or "index"?

Sat Apr 9 07:16:55 CEST 2022

On Sat, Apr 9, 2022 at 6:56 AM Kelly Thompson <kt1572757 using gmail.com> wrote:
>
> #Q. How can I "apply" a function that takes two or more vectors as
> arguments, such as cor(x, y), over a "category" or "grouping variable"
> or "index"?
> #I'm using cor() as an example, I'd like to find a way to do this for
> any function that takes 2 or more vectors as arguments.
>
> #create example data
>
> my_category <- rep ( c("a","b","c"),  4)
>
> set.seed(12345)
> my_x <- rnorm(12)
>
> set.seed(54321)
> my_y <- rnorm(12)
>
> my_df <- data.frame(my_category, my_x, my_y)
>
> #review data
> my_df
>
> #If i wanted to get the correlation of x and y grouped by category, I
> could use this code and loop:
>
> my_category_unique <- unique(my_category)
>
> my_results <- vector("list", length(my_category_unique) )
> names(my_results) <- my_category_unique
>
> #start i loop
>   for (i in 1:length(my_category_unique) ) {
>     my_criteria_i <- my_category == my_category_unique[i]
>     my_x_i <- my_x[which(my_criteria_i)]
>     my_y_i <- my_y[which(my_criteria_i)]
>     my_correl_i <- cor(x = my_x_i, y = my_y_i)
>     my_results[i] <- list(my_correl_i)
> } # end i loop
>
> #review results
> my_results
>
> #Q. Is there a better or more "elegant" way to do this, using by(),
> aggregate(), apply(), or some other function?

split() is another generally useful function to know about: e.g.,

s <- split(my_df, ~ my_category)
lapply(s, function(d) with(d, cor(my_x, my_y)))

Best,
-Deepayan

> #This does not work and results in this error message: "Error in
> FUN(dd[x, ], ...) : incompatible dimensions"
> by (data = my_x, INDICES = my_category, FUN = cor, y = my_y)
>
> #This does not work and results in this error message: "Error in
> cor(my_df$x, my_df$y) : ... supply both 'x' and 'y' or a matrix-like
> 'x' "
> by (data = my_df, INDICES = my_category, FUN = function(x, y) { cor
> (my_df$x, my_df$y) } )
>
>
> #if I wanted the mean of x by category, I could use by() or aggregate():
> by (data = my_x, INDICES = my_category, FUN = mean)
>
> aggregate(x = my_x, by = list(my_category), FUN = mean)
>
> #Thanks!
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.