[R] "apply" a function that takes two or more vectors as arguments, such as cor(x, y), over a "category" or "grouping variable" or "index"?

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Sat Apr 9 19:03:42 CEST 2022


Hello,



Às 17:50 de 09/04/2022, Rui Barradas escreveu:
> Hello,
> 
> Yes, that's possible. Must by() will still pass only one object to the 

Typo: "But" ------------^^^^

Rui Barradas

> function. Then, in the function, process this object's columns.
> 
> 
> by(my_df[-1], my_df$my_category, \(x) udf_x_plus_y(x[[1]], x[[2]]))
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
> Às 17:36 de 09/04/2022, Kelly Thompson escreveu:
>> Thanks. I have a clarification and a follow-up question. I should have
>> asked this in the original post, and I should have provided a better
>> example for the FUN argument, I apologize.
>>
>> For use in an example, here is a "silly" example of a function that
>> requires arguments such as x and y to be "separately assigned" :
>>
>> udf_x_plus_y <- function (x, y) { return ( x + y) }
>>
>> Q. Is there a way to use by() when the argument of FUN is a function
>> that requires arguments such as "x" and "y" to be separately assigned
>> (ex. udf_x_plus_y (x = my_x , y = my_y ), rather than assigned as a
>> range of columns using brackets (ex. cor(x)[1,2]) ?
>>
>> Something like this perhaps? (This produces an error message.)
>> by( data = my_df[-1], INDICES = my_df$my_category,  FUN = function(x,
>> y) { udf_x_plus_y (x = data$my_x, y = data$my_y) } )
>>
>> Thanks again.
>>
>> On Sat, Apr 9, 2022 at 5:32 AM Rui Barradas <ruipbarradas using sapo.pt> wrote:
>>>
>>> Hello,
>>>
>>> Another option is ?by.
>>>
>>>
>>> by(my_df[-1], my_df$my_category, cor)
>>> by(my_df[-1], my_df$my_category, \(x) cor(x)[1,2])
>>>
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>> Às 02:26 de 09/04/2022, Kelly Thompson escreveu:
>>>> #Q. How can I "apply" a function that takes two or more vectors as
>>>> arguments, such as cor(x, y), over a "category" or "grouping variable"
>>>> or "index"?
>>>> #I'm using cor() as an example, I'd like to find a way to do this for
>>>> any function that takes 2 or more vectors as arguments.
>>>>
>>>>
>>>> #create example data
>>>>
>>>> my_category <- rep ( c("a","b","c"),  4)
>>>>
>>>> set.seed(12345)
>>>> my_x <- rnorm(12)
>>>>
>>>> set.seed(54321)
>>>> my_y <- rnorm(12)
>>>>
>>>> my_df <- data.frame(my_category, my_x, my_y)
>>>>
>>>> #review data
>>>> my_df
>>>>
>>>> #If i wanted to get the correlation of x and y grouped by category, I
>>>> could use this code and loop:
>>>>
>>>> my_category_unique <- unique(my_category)
>>>>
>>>> my_results <- vector("list", length(my_category_unique) )
>>>> names(my_results) <- my_category_unique
>>>>
>>>> #start i loop
>>>>     for (i in 1:length(my_category_unique) ) {
>>>>       my_criteria_i <- my_category == my_category_unique[i]
>>>>       my_x_i <- my_x[which(my_criteria_i)]
>>>>       my_y_i <- my_y[which(my_criteria_i)]
>>>>       my_correl_i <- cor(x = my_x_i, y = my_y_i)
>>>>       my_results[i] <- list(my_correl_i)
>>>> } # end i loop
>>>>
>>>> #review results
>>>> my_results
>>>>
>>>> #Q. Is there a better or more "elegant" way to do this, using by(),
>>>> aggregate(), apply(), or some other function?
>>>>
>>>> #This does not work and results in this error message: "Error in
>>>> FUN(dd[x, ], ...) : incompatible dimensions"
>>>> by (data = my_x, INDICES = my_category, FUN = cor, y = my_y)
>>>>
>>>> #This does not work and results in this error message: "Error in
>>>> cor(my_df$x, my_df$y) : ... supply both 'x' and 'y' or a matrix-like
>>>> 'x' "
>>>> by (data = my_df, INDICES = my_category, FUN = function(x, y) { cor
>>>> (my_df$x, my_df$y) } )
>>>>
>>>>
>>>> #if I wanted the mean of x by category, I could use by() or 
>>>> aggregate():
>>>> by (data = my_x, INDICES = my_category, FUN = mean)
>>>>
>>>> aggregate(x = my_x, by = list(my_category), FUN = mean)
>>>>
>>>> #Thanks!
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list