[R] Does the correlations of component makes the correlation of one phenomena ?

Tue Dec 11 11:02:48 CET 2018

Thanks a lot David for this extended answer

The aim is to say: if simulated vs emprical correlate one by one, the sum
of both should correlate also

I want to be sure that I understood correctly:
What you have done
1) building the model ( the fittingness) according empirical vs simulated
value and predict value from this model
2) compare predicted value of the fittingness model with the sum of
empirical value, isnt ?

Thanks a lot

Le lundi 3 décembre 2018, David L Carlson <dcarlson using tamu.edu> a écrit :

> This is really a statistics question rather than an R question, but you
> did provide reproducible data. You have some moderate correlations for some
> of the tests, but they are all different relationships. You used a
> combination of base R and dplyr code, but I'll just stick with base R:
>
> > Mesures.split <- split(Mesures, Mesures$test)
> > Corrs <- sapply(Mesures.split, function(x) cor(x[, 3], x[, 4]))
> > options(digits=3)
> > Corrs
>      1      2      3      4      5      6      7      8      9     10
>  0.551  0.437  0.905 -0.106  0.841  0.556  0.809  0.772  0.709  0.512
>
> > sapply(Mesures.split, function(x) coef(lm(x[, 3]~x[, 4])))
>                  1      2       3        4      5      6      7
> (Intercept) 0.6875 0.6530 -0.2597  2.24313 0.3498 1.4436 0.4103
> x[, 4]      0.0309 0.0034  0.0353 -0.00668 0.0171 0.0168 0.0137
>                   8      9      10
> (Intercept) -0.7379 0.2929 0.48115
> x[, 4]       0.0255 0.0129 0.00891
>
> This gives you the intercept and slope for the regression lines for each
> test. Notice that they vary considerably. The slope value for predicting
> behavior from simulated varies from -0.007 to .031. When you average over
> space you effectively eliminate the correlations at the test level:
>
> > Mesures_aggregated <- aggregate(Mesures[, 3:4], by=list(Mesures$Space),
> sum)
> > cor(Mesures_aggregated[, 2:3])[1, 2]
> [1] 0.0771
>
> If you sum predicted values for empirical behavior using the 10 regression
> equations and compare that to the summed empirical value, things work out
> better.
>
> > pred <- rowSums(sapply(Mesures.split, function(x) predict(lm(x[, 3]~x[,
> 4]))))
> > cor(Mesures_aggregated[, 2], pred)
> [1] 0.776
>
> Without knowing where the simulated values come from, especially if they
> are completely independent of the empirical values, I can't say if this
> approach is wise.
>
> ---------------------------------------
> David L. Carlson
> Department of Anthropology
> Texas A&M University
>
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces using r-project.org] On Behalf Of Fatma Ell
> Sent: Sunday, December 2, 2018 4:50 AM
> To: r-help using r-project.org
> Subject: [R] Does the correlations of component makes the correlation of
> one phenomena ?
>
> Hi,
>
> I have the following dataset Mesures. It contains test which is a given
> context, Space is portion of this following context test. For each test we
> have twelve Space and an empirical measure of a behavior
> Behavior_empirical and
> a mesure of simulated behavior Behavior_simulated.
>
> Mesures=structure(list(test = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
> 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
> 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
> 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L,
> 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
> 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L,
> 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), Space = c(1L, 2L, 3L,
> 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
> 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
> 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L,
> 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
> 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L,
> 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L,
> 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
> 11L, 12L), Behavior_empirical = c(3.02040816326531, 7.95918367346939,
> 10.6162790697674, 4.64150943396226, 1.86538461538462, 1.125,
> 1.01020408163265, 1.2093023255814, 0.292452830188679, 0, 0, 0, 0,
> 1.3265306122449, 0, 3.09433962264151, 0, 1.6875, 2.02040816326531,
> 1.2093023255814, 1.75471698113208, 1.79347826086957,
> 0.243589743589744, 0, 0.377551020408163, 1.98979591836735,
> 6.75581395348837, 6.18867924528302, 7.46153846153846, 0.75, 0, 0,
> 0.292452830188679, 0, 0, 0, 0, 1.3265306122449, 1.93023255813953,
> 10.8301886792453, 3.73076923076923, 0, 2.69387755102041,
> 0.604651162790698, 1.75471698113208, 0, 0, 0, 1.51020408163265,
> 2.6530612244898, 3.86046511627907, 1.54716981132075, 1.86538461538462,
> 1.875, 2.35714285714286, 1.2093023255814, 0.292452830188679, 0, 0,
> 0.823529411764706, 6.79591836734694, 15.2551020408163,
> 5.7906976744186, 1.54716981132075, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0.773584905660377, 0, 0, 0.673469387755102, 1.81395348837209,
> 1.75471698113208, 2.51086956521739, 3.10576923076923,
> 3.70588235294118, 3.77551020408163, 9.28571428571428,
> 3.86046511627907, 1.54716981132075, 0, 0, 0, 0, 1.4622641509434, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0.673469387755102, 0, 0.292452830188679,
> 4.30434782608696, 1.09615384615385, 5.76470588235294, 0, 0,
> 1.93023255813953, 4.64150943396226, 3.73076923076923, 2.625,
> 0.673469387755102, 0.604651162790698, 0, 0, 0, 0), Behavior_simulated
> = c(18, 61, 129, 198, 128, 57, 44, 80, 36, 8, 0, 0, 0, 0, 0, 49, 50,
> 194, 211, 353, 352, 214, 120, 15, 10, 74, 145, 224, 158, 99, 26, 19,
> 7, 2, 0, 0, 180, 89, 47, 36, 34, 56, 51, 65, 44, 4, 0, 0, 116, 133,
> 131, 103, 74, 132, 75, 44, 0, 0, 0, 0, 532, 165, 18, 5, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 1, 0, 0, 6, 47, 164, 193, 185, 91, 239, 219, 168,
> 83, 1, 14, 45, 136, 129, 89, 5, 0, 0, 0, 0, 0, 0, 0, 0, 6, 17, 92,
> 280, 273, 0, 6, 25, 108, 129, 285, 171, 181, 39, 2, 0, 0)), .Names =
> c("test", "Space", "Behavior_empirical", "Behavior_simulated"),
> row.names = c(NA, 120L), class = "data.frame")
>
> For each test we study correlation between Behavior_empirical
> Behavior_simulatedelation
>
> Correlation <- character()for(i in 1:10){Mes=Mesures[(Mesures$test==i),]
> co=data.frame(test=i,value=cor(Mes$Behavior_empirical,
> Mes$Behavior_simulated))Correlation
> <- rbind(Correlation, as.data.frame(co))
> i=i+1}
>
> which give us for each test many good correlation values :
>
>     test      value1     1  0.55086832     2  0.43690913     3
> 0.90498064     4 -0.10627145     5  0.84101656     6  0.55608257     7
>  0.80880348     8  0.77212329     9  0.708862410   10  0.5116938
>
> Now , we want to conclude that, if the we have good values of
> Behavior_simulated for each test. It could build the final distribution
> which is the sum of Behavior_simulated and then compare with the sum of
> Behavior_empirical.
>
> Mesures_aggregated<- Mesures %>% group_by(Space) %>%
> summarize(Sum_Behavior_empirical=sum(Behavior_empirical),Sum_Behavior_
> simulated=sum(Behavior_simulated))
>
> I may think that my final correlation result should be good. But it is not
> the case
>
> > cor(Mesures_aggregated$ Sum_Behavior_empirical,Mesures_aggregated$Sum_Behavior_simulated)[1]
> 0.07710804
>
> Is correlation could be a result of correlations of the component of one
> phenomena ? and How to evaluate the contribution of each component test in
> building the 'Sum`?
>
>
> Thanks  a lot for your help.
>
>
> Lenny
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]