[R] Odp: Sampling in R

Petr PIKAL petr.pikal at precheza.cz
Tue Apr 21 13:08:55 CEST 2009


Hi

r-help-bounces at r-project.org napsal dne 21.04.2009 12:25:01:

> 
> Dear R users, 
> 
> I need to do sampling without replacement (bootstraps). I have two 
variables 
> (Xvar, Yvar). 
> I have a correlation from original data set cor(Xvar, Yvar)=0.6174221. I 
am 
> doing 50000 sampling, 
> and in each sampling  calculating correlations, saving, sorting and 
getting 
> 95% cutt off point (0.1351877). 
> I am getting maximum value as 0.3507219 (much smaller than correlation 
of my 
> original data). 
> I repeated the sampling a  couple of time and none of them produced a 
correlation 
> coefficient higher than my original data set. However, if I sort out my 
Xvar 
> and Yvar and 
> obtain correlation it is 0.9657125 which is much higher than correlation 
for 
> my original data. 
> I am doing sampling in another program and getting at least 1% higher 
> correlation than mine. 
> Now I am getting confused with sampling(random data) in R. My data and 
codes 
> for the scenario above are below
> 
> 
> 
Xvar<-c(0.1818182,0.5384615,0.5535714,0.4680851,0.4545455,0.4385965,0.5185185,
> 0.4035088,0.4901961,0.3650794,0.462963,0.4,0.56,0.3965517,0.4909091,
> 0.4716981,0.4310345,0.2,0.1509434,0.2647059,0.173913,0.1914894,0.
> 
1914894,0.1489362,0.1363636,0.2244898,0.2325581,0.1333333,0.1818182,0.1702128,
> 0.2173913,0.2380952,0.1632653,0.5614035,0.3396226,0.4909091,0.3770492,
> 0.5,0.5185185,0.5,0.4666667,0.4464286,0.362069,0.4285714,0.4561404,
> 0.4736842,0.4545455,0.4166667,0.4181818,0.4590164,0.5166667,0.5423729,
> 0.4833333,0.5454545,0.4393939,0.5172414,0.4098361,0.4745763,0.4754098,
> 0.5166667,0.5,0.4603175,0.42,0.4038462,0.4897959,0.3148148,0.3673469,
> 0.4,0.4583333,0.3877551,0.4375,0.4117647,0.4313725,0.5333333,0.3962264,
> 0.3548387,0.5272727,0.4137931,0.3928571,0.4666667,0.4210526,0.4363636,
> 
0.4545455,0.4310345,0.4237288,0.4814815,0.4912281,0.4333333,0.4,0.4285714,
> 0.4516129,0.5090909,0.4464286,0.4642857,0.4166667,0.4098361,0.4909091,
> 
0.3809524,0.5272727,0.4814815,0.5254237,0.627451,0.5,0.5471698,0.5454545,
> 0.5925926,0.5769231,0.5818182,0.4444444,0.4915254,0.4727273,0.4107143,
> 0.4285714,0.4310345,0.4237288,0.4285714,0.440678,0.4237288,0.4807692,
> 0.4150943,0.4615385,0.4107143,0.4814815,0.4074074,0.4210526,0.5263158,
> 0.440678,0.4576271,0.5344828,0.5,0.5636364,0.4677419,0.5,0.5192308,
> 0.4642857,0.5090909,0.58,0.4482759,0.5098039,0.4035088,0.4210526,0.
> 
5098039,0.4385965,0.5283019,0.5471698,0.625,0.4310345,0.4912281,0.5283019,
>         0.4576271,0.5471698,0.4745763,0.4821429)
> 
> 
Yvar<-c(0.2553191,0.4107143,0.5660377,0.3888889,0.3606557,0.2898551,0.3818182,
> 0.4,0.4,0.3278689,0.2903226,0.4074074,0.4181818,0.3,0.2238806,0.3728814,
> 0.3709677,0.2307692,0.2830189,0.2244898,0.2142857,0.2131148,0.22,0.
> 
2258065,0.2321429,0.2,0.2264151,0.22,0.2115385,0.2459016,0.1166667,0.1785714,
> 0.2068966,0.6,0.4285714,0.3134328,0.4461538,0.3965517,0.4769231,0.
> 
6181818,0.4827586,0.3709677,0.3965517,0.4821429,0.4545455,0.359375,0.4576271,
> 0.4516129,0.5272727,0.4603175,0.4,0.4912281,0.5384615,0.5,0.4516129,0.
> 4126984,0.4655172,0.5263158,0.4925373,0.358209,0.4285714,0.4920635,
> 0.4482759,0.3235294,0.4,0.4375,0.440678,0.3898305,0.35,0.4528302,0.58,
> 0.4153846,0.3174603,0.5185185,0.3870968,0.2894737,0.3709677,0.369863,
> 0.3676471,0.3636364,0.3088235,0.328125,0.4032258,0.4084507,0.3188406,
> 
0.3636364,0.3823529,0.2816901,0.4722222,0.5,0.3521127,0.4393939,0.3787879,
> 0.453125,0.4324324,0.4057971,0.4545455,0.4492754,0.5,0.4098361,0.
> 
4067797,0.3666667,0.3928571,0.4285714,0.5,0.2923077,0.4561404,0.45,0.5538462,
> 0.4626866,0.4057971,0.3676471,0.5322581,0.5428571,0.375,0.4411765,0.
> 
4571429,0.4,0.3846154,0.3870968,0.4915254,0.530303,0.4375,0.4918033,0.4179104,
>         0.4032258,0.3606557,0.5178571,0.4848485,0.390625,0.375,0.4375,0.
> 
3666667,0.4,0.4477612,0.2571429,0.4032258,0.3382353,0.4814815,0.4090909,0.3548387,
> 0.4821429,0.5,0.557377,0.4333333,0.5454545,0.4590164,0.3943662,0.
> 5076923,0.5,0.3283582,0.3676471,0.559322)
> 
> my.cor<-cor(Xvar, Yvar)
> print(my.cor)
> 
> nperm<-49999
> Perm.Cor<-NULL
> 
> for (iperm in 1:nperm)  {
> XvarNew<-sample(Xvar, size=length(Xvar), replace=FALSE) 
> YvarNew<-sample(Yvar, size=length(Yvar), replace=FALSE) 
> perm.cor<-cor(XvarNew, YvarNew)
> Perm.Cor<-c(Perm.Cor, perm.cor)
>                         }

AFAICU you do not sample your data you shuffle them. Then you compute cor 
with shuffled data (X and Y are shuffled independently) which results in 
low correlation (it is like shuffling cards).

Maybe you could use smaller size and sample not original data but a vector 
of indices

perm.cor<-rep(NA, 49999)

for (iperm in 1:nperm)  {
ind <- sample(1:length(Xvar), size = 100, replace=FALSE)
perm.cor[iperm] <- cor(Xvar[ind], Yvar[ind])
perm.cor
}
max(perm.cor)
hist(perm.cor)

The result seems to be quite reasonable.

Regards
Petr



> print(max(Perm.Cor))
> XvarSorted<-sort(Xvar, decreasing=TRUE)
> YvarSorted<-sort(Yvar, decreasing=TRUE)
> max.cor<-cor(XvarSorted, YvarSorted)
> print(max.cor)
> if(mat.cor>0) Perm.Cor.Sorted<-sort(Perm.Cor, decreasing=TRUE)   
> if(mat.cor<0) Perm.Cor.Sorted<-sort(Perm.Cor, decreasing=FALSE)   
> T95<-Perm.Cor.Sorted[(nperm+1)*0.05]    # 95% treshold value
> T99<-Perm.Cor.Sorted[(nperm+1)*0.01]    # 99% treshold value
> 
> 
> 
> I want to understand where I am making a mistake. Any comment is deeply 
appreciated.
> 
> Kind Regards
> 
> Seyit Ali
> 
> 
> 
------------------------------------------------------------------------------------------------------------------
> Dr. Seyit Ali KAYIS
> Selcuk University
> Faculty of Agriculture
> Kampus, Konya, TURKEY
> 
>             s_a_kayis at yahoo.com,    s_a_kayis at hotmail.com
> Tell: +90 332 223 2830  Mobile: +90 535 587 1139  Fax: +90 332 241 0108
> 
>                    Greetings from Konya, TURKEY
>                 http://www.ziraat.selcuk.edu.tr/skayis/
> 
----------------------------------------------------------------------------------------------------------------------
> 
> 
> 
> 
> 
> 
> 
> _________________________________________________________________
> Earning enough? Find out with SEEK Salary Survey
> 
> 
%2Eco%2Enz%2F%3Ftracking%3Dsk%3Atl%3Asknzsal%3Amsnnz%3A0%3Ahottag%3Aearn%
> 5Fenough&_t=757263783&_r=Seek_NZ_tagline&_m=EXT
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list