# [R] Approach for Storing Result Data

G.Maubach at weinwolf.de G.Maubach at weinwolf.de
Wed Mar 8 16:27:08 CET 2017

```Hi All,

today I have a more general question concerning the approach of storing
different values from the analysis of multiple variables.

My task is to compare distributions in a universe with distributions from
the respondents using a whole bunch of variables. Comparison shall be done
on relative frequencies (proportions).

I was thinking about the structure I should store the results in and came
up with the following:

-- cut --

library(stringi)

# Result data frame
# Some sort of tidytidy data set where
# each value is stored as an identity.
# This way all values for all variables could be stored in
# one unique data structure.
# research one could also build result data set across
# surveys.
# Values for measure could be "number" for 'raw' values or
# "freq" for frequencies/counts.
# Values for unit could be "n" for 'numbers' and
# "%" for percentages.
d_test <- data.frame(
group = rep(c("Universe", "Respondents"), each = 16),
variable = rep("State", 32),
value = rep(c(11.3,
12.7,
3.3,
5,
0.6,
8.1,
6.2,
5.8,
6.4,
14.5,
8.3,
0.3,
3.8,
2.5,
8.1,
3), 2),
"Bayern",
"Berlin",
"Brandenburg",
"Bremen",
"Hamburg",
"Hessen",
"Mecklenburg-Vorpommern",
"Niedersachsen",
"Nordrhein-Westfalen",
"Rheinland-Pfalz",
"Saarland",
"Sachsen",
"Sachsen-Anhalt",
"Schleswig-Holstein",
"Thueringen"),2),
measure = rep("freq", 32),
unit = rep("%", 32),
stringsAsFactors = FALSE
)

# This way the variables can be selected using simple
# value selection from Base R functionality.
data <- d_test[d_test\$variable == "State" ,]

# And plot results for every variable.
ggplot(
data = data,
aes(
x = label,
y = value,
fill = group)) +
geom_bar(stat = "identity", position = "dodge") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_discrete(name = stringi::stri_trans_totitle(names(data)[1]))
+
scale_x_discrete(name = data\$variable[1]) +
scale_y_discrete(name = data\$unit[1])

-- cut --

The reporting / presentation is done in R Markdown. I would load the
result data set once at the beginning and running the comparisons as plots
on each variable named in the results data set under "variable".

If I follow this approach for my customer relationship survey, do think I
would face drawbacks or run into serious trouble?

I am interested in your opinion and open for other approaches and
suggestions.

Kind regards

Georg

```