[R] a question about "by" and "ddply"

Wed May 30 07:04:37 CEST 2012

On Wed, May 30, 2012 at 12:58 AM, David Winsemius
<dwinsemius at comcast.net> wrote:
>
> On May 29, 2012, at 6:32 PM, jacaranda tree wrote:
>
>> Hi all,
>> I have a data set (df, n=10 for the sake of simplicity here) where I have
>> two continuous variables (age and weight) and I also have a grouping
>> variable (group, with two levels). I want to run correlations for each group
>> separately (kind of similar to "split file" in SPSS). I've been
>> experimenting with different functions, and I was able to do this correctly
>> using ddply function, but output is a little bit difficult to read when I do
>> the cor.test to get all the data with p values, df, and pearson r (see
>> below). I also tried to do it with by function. Although, with by, it shows
>> the data for two groups separately, it seems like it calculates the same r
>> for both groups. Here is my code for both ddply and by, and the output as
>> well. I was wondering if there is a way to display the output better with
>> ddply or run the correlations correctly for each group using by.
>> Thanks in advance,
>>
>
> I would have imagined something along the lines of
>
> lapply( split( df, df$group, function(x) cor.test(x[["age"]], x[["weight")]
> )

lapply( split( df, df$group), function(x) cor.test(x[["age"]], x[["weight"]]) )

I'd imagine (I've been hunting down missing parentheses all night so
excuse the pedantry)

Repeating David's disclaimer "... but without an example it's only a guess."

Best,
M

>
> ... but without an example it's only a guess.
>
> --
> David
>
>> 1.with  "ddply"
>> r<-ddply(df, .(group), summarise, "corr" = cor.test(age, weight, method =
>> "pearson"))
>>
>> Output:
>>   Group                                 corr
>> 1      1                                  Inf
>> 2      1                                    3
>> 3      1                                    0
>> 4      1                                    1
>> 5      1                                    0
>> 6      1                            two.sided
>> 7      1 Pearson's product-moment correlation
>> 8      1                       age and weight
>> 9      1                                 1, 1
>> 10     2                             9.722211
>> 11     2                                    3
>> 12     2                          0.002311412
>> 13     2                            0.9844986
>> 14     2                                    0
>> 15     2                            two.sided
>> 16     2 Pearson's product-moment correlation
>> 17     2                       age and weight
>> 18     2                 0.7779640, 0.9990233
>>
>> 2. with "by"
>> r <- by(df, group, FUN = function(x) cor.test(age, weight, method =
>> "pearson"))
>>
>> Output:
>> Group: 1
>>
>>        Pearson's product-moment correlation
>>
>> data:  age and weight
>> t = 6.4475, df = 8, p-value = 0.0001988
>> alternative hypothesis: true correlation is not equal to 0
>> 95 percent confidence interval:
>>  0.6757758 0.9802100
>> sample estimates:
>>      cor
>> 0.9157592
>>
>> ------------------------------------------------------------
>> Group: 2
>>
>>        Pearson's product-moment correlation
>>
>> data:  age and weight
>> t = 6.4475, df = 8, p-value = 0.0001988
>> alternative hypothesis: true correlation is not equal to 0
>> 95 percent confidence interval:
>>  0.6757758 0.9802100
>> sample estimates:
>>      cor
>> 0.9157592
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.