# [R] strange behavior in data frames with duplicated column names

William Revelle wr at revelle.net
Tue May 8 16:26:43 CEST 2007

```Dear R gurus,

There is an interesting problem with accessing specific items in a
column of data frame that has incorrectly been given a duplicate
name, even though addressing the item by row and column number.
Although the column is correctly listed, an item addressed by row and
column number gives the item with the correct row and the original
not the duplicated column.

Here are the instructions to get this problem

x <- matrix(seq(1:12),ncol=3)
colnames(x) <- c("A","B","A")   #a redundant name for column 2
x.df <- data.frame(x)
x.df        #the redundant name is corrected
x.df[,3]    #show the column -- this always works
x.df[2,3]   #this works here
#now incorrectly label the columns with a duplicate name
colnames(x.df) <- c("A","B","A")  #the redundant name is not detected
x.df
x.df[,3]     #this works as above and shows the column
x.df[2,3]    #but this gives the value of the first column, not the third  <---
rownames(x.df) <- c("First","Second","Third","Third")  #detects duplicate name
x.df
x.df[4,]     #correct second row and corrected column names!
x.df[4,3]    #wrong column
x.df         #still has the original names with the duplication

and corresponding output:

>  x <- matrix(seq(1:12),ncol=3)
>  colnames(x) <- c("A","B","A")   #a redundant name for column 2
>  x.df <- data.frame(x)
>  x.df        #the redundant name is corrected
A B A.1
1 1 5   9
2 2 6  10
3 3 7  11
4 4 8  12
>  x.df[,3]    #show the column -- this always works
[1]  9 10 11 12
>  x.df[2,3]   #this works here
[1] 10
>  #now incorrectly label the columns with a duplicate name
>  colnames(x.df) <- c("A","B","A")  #the redundant name is not detected
>  x.df
A B  A
1 1 5  9
2 2 6 10
3 3 7 11
4 4 8 12
>  x.df[,3]     #this works as above and shows the column
[1]  9 10 11 12
>  x.df[2,3]    #but this gives the value of the first column, not the
>third  <---
[1] 2
>  rownames(x.df) <- c("First","Second","Third","Third")  #detects
>duplicate name
Error in `row.names<-.data.frame`(`*tmp*`, value = c("First", "Second",  :
duplicate 'row.names' are not allowed
>  x.df
A B  A
1 1 5  9
2 2 6 10
3 3 7 11
4 4 8 12
>  x.df[4,]     #correct second row and corrected column names!
A B A.1
4 4 8  12
>  x.df[4,3]    #wrong column
[1] 4
>  x.df         #still has the original names with the duplication

>  unlist(R.Version())
platform
arch                                            os
"i386-apple-darwin8.9.1"
"i386"                                 "darwin8.9.1"
system
status                                         major
"i386, darwin8.9.1"
"Patched"                                           "2"
minor
year                                         month
"5.0"
"2007"                                          "04"
day
svn rev                                      language
"25"
"41315"                                           "R"
version.string
"R version 2.5.0 Patched (2007-04-25 r41315)"
>

Bill

--
William Revelle		http://personality-project.org/revelle.html
Professor			http://personality-project.org/personality.html
Department of Psychology       http://www.wcas.northwestern.edu/psych/
Northwestern University	http://www.northwestern.edu/
Use R for statistics:                 http://personality-project.org/r

```