[R] Help on calculating spearman rank correlation for a data frame with conditions

David Winsemius dwinsemius at comcast.net
Wed Aug 29 06:56:21 CEST 2012


On Aug 28, 2012, at 9:20 PM, R. Michael Weylandt wrote:

> On Tue, Aug 28, 2012 at 9:01 PM, Yi <liuyi.feier at gmail.com> wrote:
>> Dear all,
>>
>> Suppose my data frame is as follows:
>>
>> id  price  distance
>> 1   2     4
>> 1   3    5
>> ...
>> 2  4   8
>> 2  5   9
>> ...
>> n  3   7
>> n   8  9
>>
>> I would like to calculate the rank-order correlation between price  
>> and
>> distance for each id.
>>
>> cor(price,distance,method = "spearman") calculate a correlation for  
>> all.
>>
>> Then I tried to use
>> apply(data,list='id',cor(price , distance , method = "spearman"))
>> to
>>
>
> You seem to have been cut off mid-thought,  but I'm guessing you want
> something more like:
>
> tapply(data, data$id, function(x), cor(x$price, x$distance, method =
> "spearman"))

I am dubious. tapply takes an atomic vector rather than a dataframe as  
its first argument. Generally one needs to use an lapply(split()) or  
by() for such group operations involving more than one vector:

Here's my guess:

lapply( split(data, data$id),  function(dfrm)  
{ cor(x=dfrm[["price"]],  y=dfrm[["distance"]], method =  
"spearman")  } )

OR:

by(data, data$id, function(dfrm) cor( x=dfrm[["price"]],  
y=dfrm[["distance"]] , , method = "spearman") )

-- 

David Winsemius, MD
Alameda, CA, USA




More information about the R-help mailing list