[R] By() with method = spearman

Sundar Dorai-Raj sundar.dorai-raj at pdf.com
Wed Sep 19 18:49:43 CEST 2007


Hi, Harold,

In cases like this I usually add some print info to the function. E.g.

tmp$Grade <- factor(tmp$Grade)
lapply(split(tmp, f = tmp$Grade),
        function(x) {
          z <- x[,c("mtsc07","DCBASmathscoreSPRING")]
          print(levels(factor(z$Grade)))
          print(summary(z))
          print(mean(is.na(rowSums(z))))
          cor(z, use='complete', method='spearman')
        })

My guess is that one level produces no data when using "complete". I can 
reproduce the error with:

na <- rep(NA, 10)
z <- matrix(c(1:10, na, na, 1:10), 20, 2)
cor(z, use = "complete", method = "spearman")

HTH,

--sundar

Doran, Harold said the following on 9/19/2007 9:30 AM:
> I still get an error
> 
>> tmp$Grade <- factor(tmp$Grade)
>> lapply(split(tmp, f = tmp$Grade),
> function(x){cor(x[,c("mtsc07","DCBASmathscoreSPRING")], use='complete',
> + method='spearman')})
> 
> Error in cor(x[, c("mtsc07", "DCBASmathscoreSPRING")], use = "complete",
> : 
>         'x' is empty 
> 
> I noticed tmp$Grade (my index variable) was numeric. So, I coerced it
> into a factor. I get the same error message, however.
> 
> Notice, however, that this code works correctly
> 
> lapply(split(tmp, f = tmp$Grade),
> function(x){cor(x[,c("mtsc07","DCBASmathscoreSPRING")], use='complete',
> method='pearson')})
> 
> The only differece is that method is changed to pearson.
> 
>> -----Original Message-----
>> From: Chuck Cleland [mailto:ccleland at optonline.net] 
>> Sent: Wednesday, September 19, 2007 12:22 PM
>> To: Doran, Harold
>> Subject: Re: [R] By() with method = spearman
>>
>> Doran, Harold wrote:
>>> Thanks, Chuck. Seems odd though, doesn't it? There must be 
>> something 
>>> with my data set. But, I don't have any clue what it might 
>> be since I 
>>> can compute pearson using by() and I can subset and 
>> actually compute 
>>> spearman using just cor()
>> Harold:
>>   What happens when you approach the problem with split() and 
>> lapply() instead of by()?  For example:
>>
>> lapply(split(iris, f = iris$Species),
>> function(x){cor(x[,c("Sepal.Length","Sepal.Width")], use='complete',
>> method='spearman')})
>>
>> $setosa
>>              Sepal.Length Sepal.Width
>> Sepal.Length    1.0000000   0.7553375
>> Sepal.Width     0.7553375   1.0000000
>>
>> $versicolor
>>              Sepal.Length Sepal.Width
>> Sepal.Length     1.000000    0.517606
>> Sepal.Width      0.517606    1.000000
>>
>> $virginica
>>              Sepal.Length Sepal.Width
>> Sepal.Length    1.0000000   0.4265165
>> Sepal.Width     0.4265165   1.0000000
>>
>> hope this helps,
>>
>> Chuck
>>
>>>> -----Original Message-----
>>>> From: Chuck Cleland [mailto:ccleland at optonline.net]
>>>> Sent: Wednesday, September 19, 2007 12:14 PM
>>>> To: Doran, Harold
>>>> Cc: r-help at r-project.org
>>>> Subject: Re: [R] By() with method = spearman
>>>>
>>>> Doran, Harold wrote:
>>>>> I have a data set where I want the correlations between 2 
>> variables 
>>>>> conditional on a students grade level.
>>>>>
>>>>> This code works just fine.
>>>>>
>>>>> by(tmp[,c('mtsc07', 'DCBASmathscoreSPRING')], tmp$Grade, cor, 
>>>>> use='complete', method='pearson')
>>>>>
>>>>> However, this generates an error
>>>>>
>>>>> by(tmp[,c('mtsc07', 'DCBASmathscoreSPRING')], tmp$Grade, cor, 
>>>>> use='complete', method='spearman') Error in FUN(data[x, ],
>>>> ...) : 'x' 
>>>>> is empty
>>>>>
>>>>> I can subset the data by grade and compute spearman rho as
>>>>>
>>>>> tmp5 <- subset(tmp, Grade == 5)
>>>>> cor(tmp5[,c('mtsc07', 'DCBASmathcountSPRING')], use='complete',
>>>>> method='spearman')
>>>>>
>>>>> But doing this iteratively is inefficient.
>>>>>
>>>>> I don't see anything in the help man for by() or cor() that
>>>> tells me
>>>>> what the problem is. I might be missing it though. Any thoughts?
>>>>   It works as expected using the iris data:
>>>>
>>>> by(iris[,c('Sepal.Length', 'Sepal.Width')], iris$Species, cor,
>>>>    use='complete', method='spearman')
>>>>
>>>> iris$Species: setosa
>>>>              Sepal.Length Sepal.Width
>>>> Sepal.Length    1.0000000   0.7553375
>>>> Sepal.Width     0.7553375   1.0000000
>>>> --------------------------------------------------------------
>>>> -------------------------------------------------------
>>>>
>>>> iris$Species: versicolor
>>>>              Sepal.Length Sepal.Width
>>>> Sepal.Length     1.000000    0.517606
>>>> Sepal.Width      0.517606    1.000000
>>>> --------------------------------------------------------------
>>>> -------------------------------------------------------
>>>>
>>>> iris$Species: virginica
>>>>              Sepal.Length Sepal.Width
>>>> Sepal.Length    1.0000000   0.4265165
>>>> Sepal.Width     0.4265165   1.0000000
>>>>
>>>>> sessionInfo()
>>>> R version 2.5.1 Patched (2007-09-16 r42884)
>>>> i386-pc-mingw32
>>>>
>>>> locale:
>>>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>>>> States.1252;LC_MONETARY=English_United
>>>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>>>
>>>> attached base packages:
>>>> [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"
>>>> "methods"   "base"
>>>>
>>>> other attached packages:
>>>>  lattice
>>>> "0.16-5"
>>>>
>>>>> Thanks,
>>>>> Harold
>>>>>
>>>>>
>>>>>> sessionInfo()
>>>>> R version 2.5.0 (2007-04-23)
>>>>> i386-pc-mingw32
>>>>>
>>>>> locale:
>>>>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>>>>> States.1252;LC_MONETARY=English_United
>>>>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"
>>>>> "methods"   "base"     
>>>>>
>>>>> other attached packages:
>>>>>  lattice
>>>>> "0.15-4" 
>>>>>
>>>>>
>>>>> 	[[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, 
>> reproducible code. 
>>>> --
>>>> Chuck Cleland, Ph.D.
>>>> NDRI, Inc.
>>>> 71 West 23rd Street, 8th floor
>>>> New York, NY 10010
>>>> tel: (212) 845-4495 (Tu, Th)
>>>> tel: (732) 512-0171 (M, W, F)
>>>> fax: (917) 438-0894
>>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> --
>> Chuck Cleland, Ph.D.
>> NDRI, Inc.
>> 71 West 23rd Street, 8th floor
>> New York, NY 10010
>> tel: (212) 845-4495 (Tu, Th)
>> tel: (732) 512-0171 (M, W, F)
>> fax: (917) 438-0894
>>
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list