[R] getting data from a "vertical" table into a "2-dimensional" grid
Kelly Thompson
kt1572757 @end|ng |rom gm@||@com
Sat Oct 22 02:13:59 CEST 2022
Bert,
Thanks! I'm pretty sure what you provided gets me to what I was
looking for, and is much simpler. I really appreciate your help.
A follow-up question:
I adjusted the code to not use "hard-coded" column names.
mat2 <- with(data_original, tapply( get(names(data_original)[3]),
list( get(names(data_original)[1]), get(names(data_original)[2])), sum
))
Is there any better way to write that?
Thanks again!
-----
For clarity and to improve upon what I previously wrote, and so I can
practice writing questions like this and asking for help, here's a
recap of my question and "reproducible code", and the "better way" you
provided:
I have data presented in a 3-column data frame as shown below in
"data_original".
I want to aggregate the data in column 3, with the "by" argument using
the first and second columns of "data_original".
I want the results of the aggregation in a matrix, as shown below in "mat1".
As my end "result", I want a matrix with one row for each unique value
of column1 of data_original and one column for each unique value of
column2 of data_original.
What I show below seems like one way this can be done.
My question: Are there easier or better ways to do this, especially in
Base R, and also in R packages?
#create data
set.seed(1)
data_original <- data.frame(year = rep(1990:1999, length = 50),
category = sample(1:5, size = 50, replace = TRUE), sales =
sample(0:99999, size = 50 , replace = TRUE) )
dim(data_original)
#remove rows where data_original[,1] == 1990 & data_original[,2] == 5,
to ensure there is at least one NA in the desired matrix (this is an
"edge" case I want the code to "deal with" correctly.)
data_original <- data_original[ (data_original[,1] == 1990 &
data_original[,2] == 5) == FALSE, ]
dim(data_original)
#aggregate data
data_aggregate_col3_by_col1_and_col2 <- aggregate(x =
data_original[3], by = list(data_original[,1], data_original[,2]), FUN
= sum)
colnames(data_aggregate_col3_by_col1_and_col2) <- colnames(data_original)
dim(data_aggregate_col3_by_col1_and_col2)
data_expanded <-
expand.grid(unique(data_aggregate_col3_by_col1_and_col2[,1]),
unique(data_aggregate_col3_by_col1_and_col2[,2]))
colnames(data_expanded) <- colnames(data_aggregate_col3_by_col1_and_col2)[1:2]
dim(data_expanded)
data_expanded <- merge(data_expanded,
data_aggregate_col3_by_col1_and_col2, all = TRUE)
dim(data_expanded)
mat1 <- matrix(data = data_expanded[,3], nrow =
length(unique(data_expanded[,1])), ncol =
length(unique(data_expanded[,2])) , byrow = TRUE, dimnames = list(
unique(data_expanded[,1]), unique(data_expanded[,2]) ) )
#this is an easier way, using with and tapply
mat2 <- with(data_original, tapply( get(names(data_original)[3]),
list( get(names(data_original)[1]), get(names(data_original)[2])), sum
))
#check that mat1 and mat 2 are "nearly equal"
all.equal(mat1, mat2)
Gunter <bgunter.4567 using gmail.com> wrote:
>
> "As my end result, I want a matrix or data frame, with one row for each
> year, and one column for each category."
>
> If I understand you correctly, no reshaping gymnastics are needed --
> just use ?tapply:
>
> set.seed(1)
> do <- data.frame(year = rep(1990:1999, length = 50),
> category = sample(1:5, size = 50, replace = TRUE),
> sales = sample(0:99999, size = 50 , replace = TRUE) )
>
>
> with(do, tapply(sales, list(year, category),sum))
> ## which gives the matrix:
>
> 1 2 3 4 5
> 1990 13283 NA 55083 87522 64877
> 1991 NA 80963 NA 30100 28277
> 1992 9391 202916 NA 55090 NA
> 1993 29696 167344 NA NA 17625
> 1994 98015 99521 NA 70536 52252
> 1995 157003 NA 26875 NA 11366
> 1996 32986 88683 6562 79475 95282
> 1997 13601 NA 134757 12398 NA
> 1998 30537 51117 31333 20204 NA
> 1999 39240 87845 62479 NA 98804
>
> If this is not what you wanted, you may need to explain further or
> await a response from someone more insightful than I.
>
> Cheers,
> Bert
>
>
> On Fri, Oct 21, 2022 at 3:34 PM Kelly Thompson <kt1572757 using gmail.com> wrote:
> >
> > As my end result, I want a matrix or data frame, with one row for each
> > year, and one column for each category.
> >
> > On Fri, Oct 21, 2022 at 6:23 PM Kelly Thompson <kt1572757 using gmail.com> wrote:
> > >
> > > # I think this might be a better example.
> > >
> > > # I have data presented in a "vertical" dataframe as shown below in
> > > data_original.
> > > # I want this data in a matrix or "grid", as shown below.
> > > # What I show below seems like one way this can be done.
> > >
> > > # My question: Are there easier or better ways to do this, especially
> > > in Base R, and also in R packages?
> > >
> > > #create data
> > > set.seed(1)
> > > data_original <- data.frame(year = rep(1990:1999, length = 50),
> > > category = sample(1:5, size = 50, replace = TRUE), sales =
> > > sample(0:99999, size = 50 , replace = TRUE) )
> > > dim(data_original)
> > >
> > > #remove rows where data_original$year == 1990 & data_original$category
> > > == 5, to ensure there is at least one NA in the "grid"
> > > data_original <- data_original[ (data_original$year == 1990 &
> > > data_original$category == 5) == FALSE, ]
> > > dim(data_original)
> > >
> > > #aggregate data
> > > data_aggregate_sum_by_year_and_category <- aggregate(x =
> > > data_original$sales, by = list(year = data_original$year, category =
> > > data_original$category), FUN = sum)
> > > colnames(data_aggregate_sum_by_year_and_category) <- c('year',
> > > 'category', 'sum_of_sales')
> > > dim(data_aggregate_sum_by_year_and_category)
> > >
> > > data_expanded <- expand.grid(year =
> > > unique(data_aggregate_sum_by_year_and_category$year), category =
> > > unique(data_aggregate_sum_by_year_and_category$category))
> > > dim(data_expanded)
> > > data_expanded <- merge(data_expanded,
> > > data_aggregate_sum_by_year_and_category, all = TRUE)
> > > dim(data_expanded)
> > >
> > > mat <- matrix(data = data_expanded$sum_of_sales, nrow =
> > > length(unique(data_expanded$year)), ncol =
> > > length(unique(data_expanded$category)) , byrow = TRUE, dimnames =
> > > list( unique(data_expanded$year), unique(data_expanded$category) ) )
> > >
> > >
> > > data_original
> > > data_expanded
> > > mat
> > >
> > > On Fri, Oct 21, 2022 at 5:03 PM Kelly Thompson <kt1572757 using gmail.com> wrote:
> > > >
> > > > ###
> > > > #I have data presented in a "vertical" data frame as shown below in
> > > > data_original.
> > > > #I want this data in a matrix or "grid", as shown below.
> > > > #What I show below seems like one way this can be done.
> > > >
> > > > #My question: Are there easier or better ways to do this, especially
> > > > in Base R, and also in R packages?
> > > >
> > > > #reproducible example
> > > >
> > > > data_original <- data.frame(year = c('1990', '1999', '1990', '1989'),
> > > > size = c('s', 'l', 'xl', 'xs'), n = c(99, 33, 3, 4) )
> > > >
> > > > data_expanded <- expand.grid(unique(data_original$year),
> > > > unique(data_original$size), stringsAsFactors = FALSE )
> > > > colnames(data_expanded) <- c('year', 'size')
> > > > data_expanded <- merge(data_expanded, data_original, all = TRUE)
> > > >
> > > > mat <- matrix(data = data_expanded $n, nrow =
> > > > length(unique(data_expanded $year)), ncol =
> > > > length(unique(data_expanded $size)) , byrow = TRUE, dimnames = list(
> > > > unique(data_expanded$year), unique(data_expanded$size) ) )
> > > >
> > > > data_original
> > > > data_expanded
> > > > mat
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list