[R] By() with method = spearman

Ivar Herfindal ivar.herfindal at bio.ntnu.no
Wed Sep 19 18:47:02 CEST 2007


Hello

It seems like method="pearson" accept missing values, that is, none 
complete cases, and only generate correlation coefficients of NA, while 
using method="spearman" gives and error message. try e.g.
 
testdata <- cbind.data.frame(gr=rep(letters[1:4], each=5), aa=rnorm(20), 
bb=rnorm(20))
testdata[1:5, 2] <- NA
by(testdata[,c("aa", "bb")], testdata$gr, cor, use="complete", 
method="pearson")
# provides result for every group, but NA for group a

by(testdata[,c("aa", "bb")], testdata$gr, cor, use="complete", 
method="spearman")
# gives:
Error in FUN(data[x, ], ...) : 'x' is empty

I guess a deeper look into cor would reveal what is actually going on 
with spearman vs pearson vs kendall (kendall also provides error message).
Anyway, this leads me to believe that you may have groups with no 
complete pairs.

Sincerely

Ivar

Doran, Harold skrev:
> I still get an error
>
>   
>> tmp$Grade <- factor(tmp$Grade)
>> lapply(split(tmp, f = tmp$Grade),
>>     
> function(x){cor(x[,c("mtsc07","DCBASmathscoreSPRING")], use='complete',
> + method='spearman')})
>
> Error in cor(x[, c("mtsc07", "DCBASmathscoreSPRING")], use = "complete",
> : 
>         'x' is empty 
>
> I noticed tmp$Grade (my index variable) was numeric. So, I coerced it
> into a factor. I get the same error message, however.
>
> Notice, however, that this code works correctly
>
> lapply(split(tmp, f = tmp$Grade),
> function(x){cor(x[,c("mtsc07","DCBASmathscoreSPRING")], use='complete',
> method='pearson')})
>
> The only differece is that method is changed to pearson.
>
>   
>> -----Original Message-----
>> From: Chuck Cleland [mailto:ccleland at optonline.net] 
>> Sent: Wednesday, September 19, 2007 12:22 PM
>> To: Doran, Harold
>> Subject: Re: [R] By() with method = spearman
>>
>> Doran, Harold wrote:
>>     
>>> Thanks, Chuck. Seems odd though, doesn't it? There must be 
>>>       
>> something 
>>     
>>> with my data set. But, I don't have any clue what it might 
>>>       
>> be since I 
>>     
>>> can compute pearson using by() and I can subset and 
>>>       
>> actually compute 
>>     
>>> spearman using just cor()
>>>       
>> Harold:
>>   What happens when you approach the problem with split() and 
>> lapply() instead of by()?  For example:
>>
>> lapply(split(iris, f = iris$Species),
>> function(x){cor(x[,c("Sepal.Length","Sepal.Width")], use='complete',
>> method='spearman')})
>>
>> $setosa
>>              Sepal.Length Sepal.Width
>> Sepal.Length    1.0000000   0.7553375
>> Sepal.Width     0.7553375   1.0000000
>>
>> $versicolor
>>              Sepal.Length Sepal.Width
>> Sepal.Length     1.000000    0.517606
>> Sepal.Width      0.517606    1.000000
>>
>> $virginica
>>              Sepal.Length Sepal.Width
>> Sepal.Length    1.0000000   0.4265165
>> Sepal.Width     0.4265165   1.0000000
>>
>> hope this helps,
>>
>> Chuck
>>
>>     
>>>> -----Original Message-----
>>>> From: Chuck Cleland [mailto:ccleland at optonline.net]
>>>> Sent: Wednesday, September 19, 2007 12:14 PM
>>>> To: Doran, Harold
>>>> Cc: r-help at r-project.org
>>>> Subject: Re: [R] By() with method = spearman
>>>>
>>>> Doran, Harold wrote:
>>>>         
>>>>> I have a data set where I want the correlations between 2 
>>>>>           
>> variables 
>>     
>>>>> conditional on a students grade level.
>>>>>
>>>>> This code works just fine.
>>>>>
>>>>> by(tmp[,c('mtsc07', 'DCBASmathscoreSPRING')], tmp$Grade, cor, 
>>>>> use='complete', method='pearson')
>>>>>
>>>>> However, this generates an error
>>>>>
>>>>> by(tmp[,c('mtsc07', 'DCBASmathscoreSPRING')], tmp$Grade, cor, 
>>>>> use='complete', method='spearman') Error in FUN(data[x, ],
>>>>>           
>>>> ...) : 'x' 
>>>>         
>>>>> is empty
>>>>>
>>>>> I can subset the data by grade and compute spearman rho as
>>>>>
>>>>> tmp5 <- subset(tmp, Grade == 5)
>>>>> cor(tmp5[,c('mtsc07', 'DCBASmathcountSPRING')], use='complete',
>>>>> method='spearman')
>>>>>
>>>>> But doing this iteratively is inefficient.
>>>>>
>>>>> I don't see anything in the help man for by() or cor() that
>>>>>           
>>>> tells me
>>>>         
>>>>> what the problem is. I might be missing it though. Any thoughts?
>>>>>           
>>>>   It works as expected using the iris data:
>>>>
>>>> by(iris[,c('Sepal.Length', 'Sepal.Width')], iris$Species, cor,
>>>>    use='complete', method='spearman')
>>>>
>>>> iris$Species: setosa
>>>>              Sepal.Length Sepal.Width
>>>> Sepal.Length    1.0000000   0.7553375
>>>> Sepal.Width     0.7553375   1.0000000
>>>> --------------------------------------------------------------
>>>> -------------------------------------------------------
>>>>
>>>> iris$Species: versicolor
>>>>              Sepal.Length Sepal.Width
>>>> Sepal.Length     1.000000    0.517606
>>>> Sepal.Width      0.517606    1.000000
>>>> --------------------------------------------------------------
>>>> -------------------------------------------------------
>>>>
>>>> iris$Species: virginica
>>>>              Sepal.Length Sepal.Width
>>>> Sepal.Length    1.0000000   0.4265165
>>>> Sepal.Width     0.4265165   1.0000000
>>>>
>>>>         
>>>>> sessionInfo()
>>>>>           
>>>> R version 2.5.1 Patched (2007-09-16 r42884)
>>>> i386-pc-mingw32
>>>>
>>>> locale:
>>>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>>>> States.1252;LC_MONETARY=English_United
>>>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>>>
>>>> attached base packages:
>>>> [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"
>>>> "methods"   "base"
>>>>
>>>> other attached packages:
>>>>  lattice
>>>> "0.16-5"
>>>>
>>>>         
>>>>> Thanks,
>>>>> Harold
>>>>>
>>>>>
>>>>>           
>>>>>> sessionInfo()
>>>>>>             
>>>>> R version 2.5.0 (2007-04-23)
>>>>> i386-pc-mingw32
>>>>>
>>>>> locale:
>>>>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>>>>> States.1252;LC_MONETARY=English_United
>>>>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"
>>>>> "methods"   "base"     
>>>>>
>>>>> other attached packages:
>>>>>  lattice
>>>>> "0.15-4" 
>>>>>
>>>>>
>>>>> 	[[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, 
>>>>>           
>> reproducible code. 
>>     
>>>> --
>>>> Chuck Cleland, Ph.D.
>>>> NDRI, Inc.
>>>> 71 West 23rd Street, 8th floor
>>>> New York, NY 10010
>>>> tel: (212) 845-4495 (Tu, Th)
>>>> tel: (732) 512-0171 (M, W, F)
>>>> fax: (917) 438-0894
>>>>
>>>>         
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>       
>> --
>> Chuck Cleland, Ph.D.
>> NDRI, Inc.
>> 71 West 23rd Street, 8th floor
>> New York, NY 10010
>> tel: (212) 845-4495 (Tu, Th)
>> tel: (732) 512-0171 (M, W, F)
>> fax: (917) 438-0894
>>
>>     
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list