[R] How to Reformat a dataframe

@vi@e@gross m@iii@g oii gm@ii@com @vi@e@gross m@iii@g oii gm@ii@com
Sat Oct 28 17:23:15 CEST 2023


Paul,

I have snipped away your long message and want to suggest another approach
or way of thinking to consider.

You have received other good suggestions and I likely would have used
something like that, probably within the dplyr/tidyverse but consider
something simpler.

You seem to be viewing a data.frame as similar to a matrix that you want to
reformat. There are similarities but a data.frame is also different. A
Matrix actually may be the right way for you to deal with your data. Can you
read it in as a matrix or must it be a data.frame? 

The thing about a matrix is that underneath, it is just a linear vector
which you really seem to want. All your columns seem to be the same kind of
numeric and perhaps the order does not matter whether it is row major or
column major. So consider my smaller example. I am making a data.frame that
is smaller for illustration:

> small <- data.frame(A=1:4, B=5:8, C=9:12)
> small
  A B  C
1 1 5  9
2 2 6 10
3 3 7 11
4 4 8 12

Now I am making it a matrix and keeping the columns the same:

> small.mat <- as.matrix(small)
> small.mat
     A B  C
[1,] 1 5  9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12

This can be linearized into a vector in many ways such as this:

> small.vec <- as.vector(small.mat)
> small.vec
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

You can make that into a data.frame if you like:

> revised <- data.frame(colname=small.vec)
> revised
   colname
1        1
2        2
3        3
4        4
5        5
6        6
7        7
8        8
9        9
10      10
11      11
12      12

Of course, the above can be combined into more of a one-liner or made more
efficient. But in some cases, if you know the exact details of your
data.frame, you can spell out a way to combine the columns trivially. In my
example, I have three columns that can simply be concatenated into a vector
like so:

> small.onecol <- data.frame(onecol=c(small$A, small$B, small$C))
> small.onecol
   onecol
1       1
2       2
3       3
4       4
5       5
6       6
7       7
8       8
9       9
10     10
11     11
12     12

This is not a generalized solution but is simple enough even with the number
of columns you have. You are simply consolidating the vectors into one
bigger one. If you want to connect many, there are shorter loops that can do
it as in:

> cols <- colnames(small)
> cols
[1] "A" "B" "C"
> 
> new <- vector(mode="numeric", length=0)
> for (col in cols) {
+   new <- append(new, small[[col]])
+ }
> new
 [1]  1  2  3  4  5  6  7  8  9 10 11 12
> 
> new.df <- data.frame(newname=new)
> new.df
   newname
1        1
2        2
3        3
4        4
5        5
6        6
7        7
8        8
9        9
10      10
11      11
12      12

The number of ways to do what you want is huge. You can pick a way that
makes more sense to you, especially the ones others have supplied, or one
that seems more efficient. As noted, all methods may also need to deal with
your NA issue at some stage.



More information about the R-help mailing list