[R] getting data from a "vertical" table into a "2-dimensional" grid

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Sat Oct 22 01:20:06 CEST 2022


"As my end result, I want a matrix or data frame, with one row for each
year, and one column for each category."

If I understand you correctly, no reshaping gymnastics are needed --
just use ?tapply:

set.seed(1)
do <- data.frame(year = rep(1990:1999, length  = 50),
category = sample(1:5, size = 50, replace = TRUE),
sales = sample(0:99999, size = 50 , replace = TRUE) )


with(do, tapply(sales, list(year, category),sum))
 ## which gives the matrix:

         1      2      3     4     5
1990  13283     NA  55083 87522 64877
1991     NA  80963     NA 30100 28277
1992   9391 202916     NA 55090    NA
1993  29696 167344     NA    NA 17625
1994  98015  99521     NA 70536 52252
1995 157003     NA  26875    NA 11366
1996  32986  88683   6562 79475 95282
1997  13601     NA 134757 12398    NA
1998  30537  51117  31333 20204    NA
1999  39240  87845  62479    NA 98804

If this is not what you wanted, you may need to explain further or
await a response from someone more insightful than I.

Cheers,
Bert


On Fri, Oct 21, 2022 at 3:34 PM Kelly Thompson <kt1572757 using gmail.com> wrote:
>
> As my end result, I want a matrix or data frame, with one row for each
> year, and one column for each category.
>
> On Fri, Oct 21, 2022 at 6:23 PM Kelly Thompson <kt1572757 using gmail.com> wrote:
> >
> > # I think this might be a better example.
> >
> > # I have data presented in a "vertical" dataframe as shown below in
> > data_original.
> > # I want this data in a matrix or "grid", as shown below.
> > # What I show below seems like one way this can be done.
> >
> > # My question: Are there easier or better ways to do this, especially
> > in Base R, and also in R packages?
> >
> > #create data
> > set.seed(1)
> > data_original <- data.frame(year = rep(1990:1999, length  = 50),
> > category = sample(1:5, size = 50, replace = TRUE),  sales =
> > sample(0:99999, size = 50 , replace = TRUE) )
> > dim(data_original)
> >
> > #remove rows where data_original$year == 1990 & data_original$category
> > == 5, to ensure there is at least one NA in the "grid"
> > data_original <- data_original[ (data_original$year == 1990 &
> > data_original$category == 5) == FALSE, ]
> > dim(data_original)
> >
> > #aggregate data
> > data_aggregate_sum_by_year_and_category <- aggregate(x =
> > data_original$sales, by = list(year = data_original$year, category =
> > data_original$category), FUN = sum)
> > colnames(data_aggregate_sum_by_year_and_category) <- c('year',
> > 'category', 'sum_of_sales')
> > dim(data_aggregate_sum_by_year_and_category)
> >
> > data_expanded <- expand.grid(year =
> > unique(data_aggregate_sum_by_year_and_category$year), category =
> > unique(data_aggregate_sum_by_year_and_category$category))
> > dim(data_expanded)
> > data_expanded <- merge(data_expanded,
> > data_aggregate_sum_by_year_and_category, all = TRUE)
> > dim(data_expanded)
> >
> > mat <- matrix(data = data_expanded$sum_of_sales, nrow =
> > length(unique(data_expanded$year)), ncol =
> > length(unique(data_expanded$category)) , byrow = TRUE, dimnames =
> > list( unique(data_expanded$year), unique(data_expanded$category) ) )
> >
> >
> > data_original
> > data_expanded
> > mat
> >
> > On Fri, Oct 21, 2022 at 5:03 PM Kelly Thompson <kt1572757 using gmail.com> wrote:
> > >
> > > ###
> > > #I have data presented in a "vertical" data frame as shown below in
> > > data_original.
> > > #I want this data in a matrix or "grid", as shown below.
> > > #What I show below seems like one way this can be done.
> > >
> > > #My question: Are there easier or better ways to do this, especially
> > > in Base R, and also in R packages?
> > >
> > > #reproducible example
> > >
> > > data_original <- data.frame(year = c('1990', '1999', '1990', '1989'),
> > > size = c('s', 'l', 'xl', 'xs'),  n = c(99, 33, 3, 4) )
> > >
> > > data_expanded <- expand.grid(unique(data_original$year),
> > > unique(data_original$size), stringsAsFactors = FALSE )
> > > colnames(data_expanded) <- c('year', 'size')
> > > data_expanded <- merge(data_expanded, data_original, all = TRUE)
> > >
> > > mat <- matrix(data = data_expanded $n, nrow =
> > > length(unique(data_expanded $year)), ncol =
> > > length(unique(data_expanded $size)) , byrow = TRUE, dimnames = list(
> > > unique(data_expanded$year), unique(data_expanded$size) ) )
> > >
> > > data_original
> > > data_expanded
> > > mat
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list