[R] Does the correlations of component makes the correlation of one phenomena ?

Sun Dec 2 11:50:00 CET 2018

Hi,

I have the following dataset Mesures. It contains test which is a given
context, Space is portion of this following context test. For each test we
have twelve Space and an empirical measure of a behavior Behavior_empirical and
a mesure of simulated behavior Behavior_simulated.

Mesures=structure(list(test = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), Space = c(1L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L), Behavior_empirical = c(3.02040816326531, 7.95918367346939,
10.6162790697674, 4.64150943396226, 1.86538461538462, 1.125,
1.01020408163265, 1.2093023255814, 0.292452830188679, 0, 0, 0, 0,
1.3265306122449, 0, 3.09433962264151, 0, 1.6875, 2.02040816326531,
1.2093023255814, 1.75471698113208, 1.79347826086957,
0.243589743589744, 0, 0.377551020408163, 1.98979591836735,
6.75581395348837, 6.18867924528302, 7.46153846153846, 0.75, 0, 0,
0.292452830188679, 0, 0, 0, 0, 1.3265306122449, 1.93023255813953,
10.8301886792453, 3.73076923076923, 0, 2.69387755102041,
0.604651162790698, 1.75471698113208, 0, 0, 0, 1.51020408163265,
2.6530612244898, 3.86046511627907, 1.54716981132075, 1.86538461538462,
1.875, 2.35714285714286, 1.2093023255814, 0.292452830188679, 0, 0,
0.823529411764706, 6.79591836734694, 15.2551020408163,
5.7906976744186, 1.54716981132075, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0.773584905660377, 0, 0, 0.673469387755102, 1.81395348837209,
1.75471698113208, 2.51086956521739, 3.10576923076923,
3.70588235294118, 3.77551020408163, 9.28571428571428,
3.86046511627907, 1.54716981132075, 0, 0, 0, 0, 1.4622641509434, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0.673469387755102, 0, 0.292452830188679,
4.30434782608696, 1.09615384615385, 5.76470588235294, 0, 0,
1.93023255813953, 4.64150943396226, 3.73076923076923, 2.625,
0.673469387755102, 0.604651162790698, 0, 0, 0, 0), Behavior_simulated
= c(18, 61, 129, 198, 128, 57, 44, 80, 36, 8, 0, 0, 0, 0, 0, 49, 50,
194, 211, 353, 352, 214, 120, 15, 10, 74, 145, 224, 158, 99, 26, 19,
7, 2, 0, 0, 180, 89, 47, 36, 34, 56, 51, 65, 44, 4, 0, 0, 116, 133,
131, 103, 74, 132, 75, 44, 0, 0, 0, 0, 532, 165, 18, 5, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 6, 47, 164, 193, 185, 91, 239, 219, 168,
83, 1, 14, 45, 136, 129, 89, 5, 0, 0, 0, 0, 0, 0, 0, 0, 6, 17, 92,
280, 273, 0, 6, 25, 108, 129, 285, 171, 181, 39, 2, 0, 0)), .Names =
c("test", "Space", "Behavior_empirical", "Behavior_simulated"),
row.names = c(NA, 120L), class = "data.frame")

For each test we study correlation between Behavior_empirical
Behavior_simulatedelation

Correlation <- character()for(i in 1:10){Mes=Mesures[(Mesures$test==i),]
co=data.frame(test=i,value=cor(Mes$Behavior_empirical,Mes$Behavior_simulated))Correlation
<- rbind(Correlation, as.data.frame(co))
i=i+1}

which give us for each test many good correlation values :

    test      value1     1  0.55086832     2  0.43690913     3
0.90498064     4 -0.10627145     5  0.84101656     6  0.55608257     7
 0.80880348     8  0.77212329     9  0.708862410   10  0.5116938

Now , we want to conclude that, if the we have good values of
Behavior_simulated for each test. It could build the final distribution
which is the sum of Behavior_simulated and then compare with the sum of
Behavior_empirical.

Mesures_aggregated<- Mesures %>% group_by(Space) %>%
summarize(Sum_Behavior_empirical=sum(Behavior_empirical),Sum_Behavior_simulated=sum(Behavior_simulated))

I may think that my final correlation result should be good. But it is not
the case

> cor(Mesures_aggregated$ Sum_Behavior_empirical,Mesures_aggregated$Sum_Behavior_simulated)[1] 0.07710804

Is correlation could be a result of correlations of the component of one
phenomena ? and How to evaluate the contribution of each component test in
building the 'Sum`?

Thanks  a lot for your help.

Lenny

	[[alternative HTML version deleted]]