[R] Bug in print for data frames?

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Thu Oct 26 15:02:49 CEST 2023


Hello,

Inline.

Às 13:32 de 26/10/2023, Ebert,Timothy Aaron escreveu:
> The "problem" goes away if you use
> 
> x$C <- y[1,]

Actually, if I understand correctly, the OP wants the column:


x$C <- y[,1]


In this case it will produce the same output because y is a df with only 
one row. But that is a very special case, the general case would be to 
extract the column.

Hope this helps,

Rui Barradas

> 
> If you have another row in your x, say:
> x <- data.frame(A=c(1,4), B=c(2,5), C=c(3,6))
> 
> then your code
> x$C <- y[1]
> returns an error.
> 
> If y has the same number of rows as x$C then R has the same outcome as in your example.
> 
> It looks like your code tells R to replace all of column C (including the name) with all of vector y.
> 
> Maybe unexpected, but not a bug. It is consistent.
> 
> 
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Rui Barradas
> Sent: Thursday, October 26, 2023 6:43 AM
> To: Christian Asseburg <rhelp using moin.fi>; r-help using r-project.org
> Subject: Re: [R] Bug in print for data frames?
> 
> [External Email]
> 
> Às 07:18 de 25/10/2023, Christian Asseburg escreveu:
>> Hi! I came across this unexpected behaviour in R. First I thought it was a bug in the assignment operator <- but now I think it's maybe a bug in the way data frames are being printed. What do you think?
>>
>> Using R 4.3.1:
>>
>>> x <- data.frame(A = 1, B = 2, C = 3)
>>> y <- data.frame(A = 1)
>>> x
>>     A B C
>> 1 1 2 3
>>> x$B <- y$A # works as expected
>>> x
>>     A B C
>> 1 1 1 3
>>> x$C <- y[1] # makes C disappear
>>> x
>>     A B A
>> 1 1 1 1
>>> str(x)
>> 'data.frame':   1 obs. of  3 variables:
>>    $ A: num 1
>>    $ B: num 1
>>    $ C:'data.frame':      1 obs. of  1 variable:
>>     ..$ A: num 1
>>
>> Why does the print(x) not show "C" as the name of the third element? I did mess up the data frame (and this was a mistake on my part), but finding the bug was harder because print(x) didn't show the C any longer.
>>
>> Thanks. With best wishes -
>>
>> . . . Christian
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat/
>> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu
>> %7C237aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84
>> %7C0%7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
>> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sda
>> ta=fgR6iFifXQpRCv0WqIu4S%2Bnctg%2F0v6j7AXftxrfQGPk%3D&reserved=0
>> PLEASE do read the posting guide
>> http://www.r/
>> -project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C23
>> 7aa7be3de54af710be08dbd61056a4%7C0d4da0f84a314d76ace60a62331e1b84%7C0%
>> 7C0%7C638339137898359565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
>> CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FN
>> CYM6%2FbpqThk76Zug%2Bm5x8o1Y2S1Z1S0ajAzPePIms%3D&reserved=0
>> and provide commented, minimal, self-contained, reproducible code.
> Hello,
> 
> To expand on the good answers already given, I will present two other example data sets.
> 
> Example 1. Imagine that instead of assigning just one column from y to x$C you assign two columns. The result is a data.frame column. See what is displayed as the columns names.
> And unlike what happens with `[`, when asssigning columns 1:2, the operator `[[` doesn't work. You will have to extract the columns y$A and y$B one by one.
> 
> 
> 
> x <- data.frame(A = 1, B = 2, C = 3)
> y <- data.frame(A = 1, B = 4)
> str(y)
> #> 'data.frame':    1 obs. of  2 variables:
> #>  $ A: num 1
> #>  $ B: num 4
> 
> x$C <- y[1:2]
> x
> #>   A B C.A C.B
> #> 1 1 2   1   4
> 
> str(x)
> #> 'data.frame':    1 obs. of  3 variables:
> #>  $ A: num 1
> #>  $ B: num 2
> #>  $ C:'data.frame':   1 obs. of  2 variables:
> #>   ..$ A: num 1
> #>   ..$ B: num 4
> 
> x[[1:2]]  # doesn't work
> #> Error in .subset2(x, i, exact = exact): subscript out of bounds
> 
> 
> 
> Example 2. Sometimes it is usefull to get a result like this first and then correct the resulting df. For instance, when computing more than one summary statistics.
> 
> str(agg)  below shows that the result summary stats is a matrix, so you have a column-matrix. And once again the displayed names reflect that.
> 
> The trick to make the result a df is to extract all but the last column as a sub-df, extract the last column's values as a matrix (which it is) and then cbind the two together.
> 
> cbind is a generic function. Since the first argument to cbind is a sub-df, the method called is cbind.data.frame and the result is a df.
> 
> 
> 
> df1 <- data.frame(A = rep(c("a", "b", "c"), 5L), X = 1:30)
> 
> # the anonymous function computes more than one summary statistics # note that it returns a named vector agg <- aggregate(X ~ A, df1, \(x) c(Mean = mean(x), S = sd(x))) agg
> #>   A    X.Mean       X.S
> #> 1 a 14.500000  9.082951
> #> 2 b 15.500000  9.082951
> #> 3 c 16.500000  9.082951
> 
> # similar effect as in the OP, The difference is that the last # column is a matrix, not a data.frame
> str(agg)
> #> 'data.frame':    3 obs. of  2 variables:
> #>  $ A: chr  "a" "b" "c"
> #>  $ X: num [1:3, 1:2] 14.5 15.5 16.5 9.08 9.08 ...
> #>   ..- attr(*, "dimnames")=List of 2
> #>   .. ..$ : NULL
> #>   .. ..$ : chr [1:2] "Mean" "S"
> 
> # nc is just a convenience, avoids repeated calls to ncol nc <- ncol(agg) cbind(agg[-nc], agg[[nc]])
> #>   A Mean        S
> #> 1 a 14.5 9.082951
> #> 2 b 15.5 9.082951
> #> 3 c 16.5 9.082951
> 
> # all is well
> cbind(agg[-nc], agg[[nc]]) |> str()
> #> 'data.frame':    3 obs. of  3 variables:
> #>  $ A   : chr  "a" "b" "c"
> #>  $ Mean: num  14.5 15.5 16.5
> #>  $ S   : num  9.08 9.08 9.08
> 
> 
> 
> If the anonymous function hadn't returned a named vetor, the new column names would have been "1". "2", try it.
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
> 
> 
> --
> Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
> http://www.avg.com/
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list