[R] getting data from a "vertical" table into a "2-dimensional" grid
CALUM POLWART
po|c1410 @end|ng |rom gm@||@com
Sat Oct 22 08:27:44 CEST 2022
You asked for base R but also said or using other methods. So for
completeness here is a solution using Tidyverse
library(tidyverse)
data_original <- data.frame(
year = c('1990', '1999', '1990', '1989'),
size = c('s', 'l', 'xl', 'xs'), n = c(99, 33, 3, 4) )
data_original |>
pivot_wider(
id_cols = year,
names_from = size,
values_from = n
)
# A tibble: 3 x 5
year s l xl xs
<chr> <dbl> <dbl> <dbl> <dbl>
1 1990 99 NA 3 NA
2 1999 NA 33 NA NA
3 1989 NA NA NA 4
If you really want a matrix add
|>
as.matrix()
To the end.
I saw you asked if you can use column position and the answer is the same
here you can replace size with 2. BUT beware this is prone to errors if
data gets changes in order etc.
This obviously needs to use tidyverse which some people hate. But it does
make the code much more readable to understand what you are doing...
On Fri, 21 Oct 2022, 23:34 Kelly Thompson, <kt1572757 using gmail.com> wrote:
> As my end result, I want a matrix or data frame, with one row for each
> year, and one column for each category.
>
> On Fri, Oct 21, 2022 at 6:23 PM Kelly Thompson <kt1572757 using gmail.com>
> wrote:
> >
> > # I think this might be a better example.
> >
> > # I have data presented in a "vertical" dataframe as shown below in
> > data_original.
> > # I want this data in a matrix or "grid", as shown below.
> > # What I show below seems like one way this can be done.
> >
> > # My question: Are there easier or better ways to do this, especially
> > in Base R, and also in R packages?
> >
> > #create data
> > set.seed(1)
> > data_original <- data.frame(year = rep(1990:1999, length = 50),
> > category = sample(1:5, size = 50, replace = TRUE), sales =
> > sample(0:99999, size = 50 , replace = TRUE) )
> > dim(data_original)
> >
> > #remove rows where data_original$year == 1990 & data_original$category
> > == 5, to ensure there is at least one NA in the "grid"
> > data_original <- data_original[ (data_original$year == 1990 &
> > data_original$category == 5) == FALSE, ]
> > dim(data_original)
> >
> > #aggregate data
> > data_aggregate_sum_by_year_and_category <- aggregate(x =
> > data_original$sales, by = list(year = data_original$year, category =
> > data_original$category), FUN = sum)
> > colnames(data_aggregate_sum_by_year_and_category) <- c('year',
> > 'category', 'sum_of_sales')
> > dim(data_aggregate_sum_by_year_and_category)
> >
> > data_expanded <- expand.grid(year =
> > unique(data_aggregate_sum_by_year_and_category$year), category =
> > unique(data_aggregate_sum_by_year_and_category$category))
> > dim(data_expanded)
> > data_expanded <- merge(data_expanded,
> > data_aggregate_sum_by_year_and_category, all = TRUE)
> > dim(data_expanded)
> >
> > mat <- matrix(data = data_expanded$sum_of_sales, nrow =
> > length(unique(data_expanded$year)), ncol =
> > length(unique(data_expanded$category)) , byrow = TRUE, dimnames =
> > list( unique(data_expanded$year), unique(data_expanded$category) ) )
> >
> >
> > data_original
> > data_expanded
> > mat
> >
> > On Fri, Oct 21, 2022 at 5:03 PM Kelly Thompson <kt1572757 using gmail.com>
> wrote:
> > >
> > > ###
> > > #I have data presented in a "vertical" data frame as shown below in
> > > data_original.
> > > #I want this data in a matrix or "grid", as shown below.
> > > #What I show below seems like one way this can be done.
> > >
> > > #My question: Are there easier or better ways to do this, especially
> > > in Base R, and also in R packages?
> > >
> > > #reproducible example
> > >
> > > data_original <- data.frame(year = c('1990', '1999', '1990', '1989'),
> > > size = c('s', 'l', 'xl', 'xs'), n = c(99, 33, 3, 4) )
> > >
> > > data_expanded <- expand.grid(unique(data_original$year),
> > > unique(data_original$size), stringsAsFactors = FALSE )
> > > colnames(data_expanded) <- c('year', 'size')
> > > data_expanded <- merge(data_expanded, data_original, all = TRUE)
> > >
> > > mat <- matrix(data = data_expanded $n, nrow =
> > > length(unique(data_expanded $year)), ncol =
> > > length(unique(data_expanded $size)) , byrow = TRUE, dimnames = list(
> > > unique(data_expanded$year), unique(data_expanded$size) ) )
> > >
> > > data_original
> > > data_expanded
> > > mat
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list