[R] Different results when converting a matrix to a data.frame

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Wed Nov 16 17:43:13 CET 2016


I will start by admitting I don't know the answer to your question.

However, I am responding because I think this should not be an issue in real life use of R. Data frames are lists of distinct vectors, each of which has its own reason for being present in the data, and normally each has its own storage mode. Your use of a matrix as a short cut way to create many columns at once does not change this fundamental difference between data frames and matrices. You should not be surprised that putting the finishing touches on this transformation takes some personal attention. 

Normally you should give explicit names to each column using the argument names in the data.frame function. When using a matrix as a shortcut, you should either immediately follow the creation of the data frame with a names(DF)<- assignment, or wrap it in a setNames function call. 

setNames( data.frame(matrix(NA, 2, 2)), c( "ColA", "ColB" ) )

Note that using a matrix to create many columns is memory inefficient, because you start by setting aside a single block of memory (the matrix) and then you move that data column at a time to separate vectors for use in the data frame. If working with large data you might want to consider allocating each column separately from the beginning. 

N <- 2
nms <- c( "A", "B" )
as.data.frame( setNames( lapply( nms, function(n){ rep( NA, 2 ) } ), nms ) )

which is not as convenient, but illustrates that data frames are truly different than matrices.
-- 
Sent from my phone. Please excuse my brevity.

On November 16, 2016 7:20:38 AM PST, G.Maubach at weinwolf.de wrote:
>Hi All,
>
>I build an empty dataframe to fill it will values later. I did the 
>following:
>
>-- cut --
>matrix(NA, 2, 2)
>     [,1] [,2]
>[1,]   NA   NA
>[2,]   NA   NA
>> data.frame(matrix(NA, 2, 2))
>  X1 X2
>1 NA NA
>2 NA NA
>> as.data.frame(matrix(NA, 2, 2))
>  V1 V2
>1 NA NA
>2 NA NA
>-- cut --
>
>Why does data.frame deliver different results than as.data.frame with 
>regard to the variable names (V instead of X)?
>
>Kind regards
>
>Georg
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list