[R] Simple Stacking of Two Columns

Ebert,Timothy Aaron tebert @end|ng |rom u||@edu
Tue Apr 4 15:40:55 CEST 2023


Originally this post was to just look at execution times for different approaches to solving this problem.
Now I have a question:
   I change the code for calculating a1 from c(c1, c2) to data.frame(c(c1,c2)). This changes the execution times of all the other variables. What am I missing?

Original
For efficiency, the answer Avi provided is still the best option with these data. The append() method is next. Both of these approaches avoid having to make a data frame in the wide format. The slowest method is pivot_longer(). Note that the order of elements is different in the pivot_longer() approach. If the order matters then some of these answers will need sorting to get the correct output. Also note that a1 and a2 are vectors, while the others are data frames. However, all of these appear correct from our understanding of the problem.

library(tidyverse)
library(microbenchmark)
c1 <- c("Tom","Dick")
c2 <- c("Larry","Curly")
res <- microbenchmark(a1 <- c(c1, c2),
                      a2 <- append(c1, c2),
                      a3 <- {c3 <- data.frame(Name1=c1, Name2=c2)
                        stack(c3)},
                      a4 <- {c3 <- data.frame(Name1=c1, Name2=c2)
                        data.frame(Names=with(c3, c(Name1, Name2)))},
                      a5 <- {c3 <- data.frame(Name1=c1, Name2=c2)
                        data.frame(Names=unlist(c3), row.names=NULL)},
                      a6 <- {c3 <- data.frame(Name1=c1, Name2=c2)
                        pivot_longer(c3, cols=everything(),names_to="Names")},
                      a7 <- {c3 <- data.frame(Name1=c1, Name2=c2)
                        data.frame(Names=c(c3$Name1,c3$Name2))},
                      times=100L)
print(res)

Mean execution times for seven different methods where a1 <- c(c1,c2)
Method	Mean(ms)  	CLD
a1		1998    		a  
a2		5749    		a  
a3		1055501  	 b 
a4 		592548  	 b 
a5 		682491  	 b 
a6		6962660	   c 
a7		608337  	 b


Mean execution times for seven different methods where a1 <- data.frame(c(c1,c2))
Method	mean		cld
a1		272.467	  b
a2		5.768		a
a3		907.171	      d
a4		561.863	    c
a5		581.989	    c
a6		6371.465	        e
a7		552.208	    c

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Richard O'Keefe
Sent: Tuesday, April 4, 2023 8:21 AM
To: Sparks, John <jspark4 using uic.edu>
Cc: r-help using r-project.org
Subject: Re: [R] Simple Stacking of Two Columns

[External Email]

Just to repeat:
you have

   NamesWide<-data.frame(Name1=c("Tom","Dick"),Name2=c("Larry","Curly"))

and you want

   NamesLong<-data.frame(Names=c("Tom","Dick","Larry","Curly"))

There must be something I am missing, because

   NamesLong <- data.frame(Names = c(NamesWide$Name1, NamesWide$Name2))

appears to do the job in the simplest possible manner.  There are all sorts of alternatives, such as
   data.frame(Name = as.vector(as.matrix(NamesWide[, 1:2])))

As for stack(), the main problem there was a typo (Names2 for Name2).

> stack(NamesWide)
  values   ind
1    Tom Name1
2   Dick Name1
3  Larry Name2
4  Curly Name2

If there were multiple columns, you might do

> stack(NamesWide[,c("Name1","Name2")])$values
[1] "Tom"   "Dick"  "Larry" "Curly"


On Tue, 4 Apr 2023 at 03:09, Sparks, John <jspark4 using uic.edu> wrote:

> Hi R-Helpers,
>
> Sorry to bother you, but I have a simple task that I can't figure out 
> how to do.
>
> For example, I have some names in two columns
>
> NamesWide<-data.frame(Name1=c("Tom","Dick"),Name2=c("Larry","Curly"))
>
> and I simply want to get a single column
> NamesLong<-data.frame(Names=c("Tom","Dick","Larry","Curly"))
> > NamesLong
>   Names
> 1   Tom
> 2  Dick
> 3 Larry
> 4 Curly
>
>
> Stack produces an error
> NamesLong<-stack(NamesWide$Name1,NamesWide$Names2)
> Error in if (drop) { : argument is of length zero
>
> So does bind_rows
> > NamesLong<-dplyr::bind_rows(NamesWide$Name1,NamesWide$Name2)
> Error in `dplyr::bind_rows()`:
> ! Argument 1 must be a data frame or a named atomic vector.
> Run `rlang::last_error()` to see where the error occurred.
>
> I tried making separate dataframes to get around the error in 
> bind_rows but it puts the data in two different columns
> Name1<-data.frame(c("Tom","Dick"))
> Name2<-data.frame(c("Larry","Curly"))
> NamesLong<-dplyr::bind_rows(Name1,Name2)
> > NamesLong
>   c..Tom....Dick.. c..Larry....Curly..
> 1              Tom                <NA>
> 2             Dick                <NA>
> 3             <NA>               Larry
> 4             <NA>               Curly
>
> gather makes no change to the data
> NamesLong<-gather(NamesWide,Name1,Name2)
> > NamesLong
>   Name1 Name2
> 1   Tom Larry
> 2  Dick Curly
>
>
> Please help me solve what should be a very simple problem.
>
> Thanks,
> John Sparks
>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu
> %7C1185a53a1a9d448bbd2f08db350715fa%7C0d4da0f84a314d76ace60a62331e1b84
> %7C0%7C0%7C638162076773455805%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sda
> ta=EpchrCOzXyr05ruHu0OoOVdxRZoX6MMm3lodvtsSnGk%3D&reserved=0
> PLEASE do read the posting guide
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
> -project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C11
> 85a53a1a9d448bbd2f08db350715fa%7C0d4da0f84a314d76ace60a62331e1b84%7C0%
> 7C0%7C638162076773455805%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mb
> iDzTsmt%2B2yeIIvjXE2vC5X7Xxnsttx0RhgCcqxNeg%3D&reserved=0
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7C1185a53a1a9d448bbd2f08db350715fa%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638162076773455805%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EpchrCOzXyr05ruHu0OoOVdxRZoX6MMm3lodvtsSnGk%3D&reserved=0
PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C1185a53a1a9d448bbd2f08db350715fa%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638162076773455805%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mbiDzTsmt%2B2yeIIvjXE2vC5X7Xxnsttx0RhgCcqxNeg%3D&reserved=0
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list