[R] strange behavior in data frames with duplicated column names

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue May 8 20:10:58 CEST 2007


First, you should not be using colnames<-, which is for a matrix, on a 
data frame.  Use names<- for data frames (and as.data.frame to convert to 
a data frame).

Second, whereas duplicate row names are not allowed in a data frame, 
duplicate column names are but at your own risk.

Third, there is a 'optimization too far' here which I will change in 2.5.0 
patched.  Often with R development there is a tradeoff between speed and 
generality.

On Tue, 8 May 2007, William Revelle wrote:

> Dear R gurus,
>
> There is an interesting problem with accessing specific items in a
> column of data frame that has incorrectly been given a duplicate
> name, even though addressing the item by row and column number.
> Although the column is correctly listed, an item addressed by row and
> column number gives the item with the correct row and the original
> not the duplicated column.
>
> Here are the instructions to get this problem
>
> x <- matrix(seq(1:12),ncol=3)
> colnames(x) <- c("A","B","A")   #a redundant name for column 2
> x.df <- data.frame(x)
> x.df        #the redundant name is corrected
> x.df[,3]    #show the column -- this always works
> x.df[2,3]   #this works here
> #now incorrectly label the columns with a duplicate name
> colnames(x.df) <- c("A","B","A")  #the redundant name is not detected
> x.df
> x.df[,3]     #this works as above and shows the column
> x.df[2,3]    #but this gives the value of the first column, not the third  <---
> rownames(x.df) <- c("First","Second","Third","Third")  #detects duplicate name
> x.df
> x.df[4,]     #correct second row and corrected column names!
> x.df[4,3]    #wrong column
> x.df         #still has the original names with the duplication
>
>
> and corresponding output:
>
>>  x <- matrix(seq(1:12),ncol=3)
>>  colnames(x) <- c("A","B","A")   #a redundant name for column 2
>>  x.df <- data.frame(x)
>>  x.df        #the redundant name is corrected
>   A B A.1
> 1 1 5   9
> 2 2 6  10
> 3 3 7  11
> 4 4 8  12
>>  x.df[,3]    #show the column -- this always works
> [1]  9 10 11 12
>>  x.df[2,3]   #this works here
> [1] 10
>>  #now incorrectly label the columns with a duplicate name
>>  colnames(x.df) <- c("A","B","A")  #the redundant name is not detected
>>  x.df
>   A B  A
> 1 1 5  9
> 2 2 6 10
> 3 3 7 11
> 4 4 8 12
>>  x.df[,3]     #this works as above and shows the column
> [1]  9 10 11 12
>>  x.df[2,3]    #but this gives the value of the first column, not the
>> third  <---
> [1] 2
>>  rownames(x.df) <- c("First","Second","Third","Third")  #detects
>> duplicate name
> Error in `row.names<-.data.frame`(`*tmp*`, value = c("First", "Second",  :
> 	duplicate 'row.names' are not allowed
>>  x.df
>   A B  A
> 1 1 5  9
> 2 2 6 10
> 3 3 7 11
> 4 4 8 12
>>  x.df[4,]     #correct second row and corrected column names!
>   A B A.1
> 4 4 8  12
>>  x.df[4,3]    #wrong column
> [1] 4
>>  x.df         #still has the original names with the duplication
>
>>  unlist(R.Version())
>                                      platform
> arch                                            os
>                      "i386-apple-darwin8.9.1"
> "i386"                                 "darwin8.9.1"
>                                        system
> status                                         major
>                           "i386, darwin8.9.1"
> "Patched"                                           "2"
>                                         minor
> year                                         month
>                                         "5.0"
> "2007"                                          "04"
>                                           day
> svn rev                                      language
>                                          "25"
> "41315"                                           "R"
>                                version.string
> "R version 2.5.0 Patched (2007-04-25 r41315)"
>>
>
>
> Bill
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list