[Rd] Non-unique column names in data frames

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Apr 3 09:21:31 CEST 2007


On Sun, 1 Apr 2007, John Fox wrote:

> Dear r-devel members,
>
> It's just been brought to my attention that R permits non-unique column
> names in data frames -- e.g., via assignment to names() or colnames(). This
> behaviour is consistent with the help files (as I discovered), but it's not
> consistent with the behaviour of rownames() and row.names(). For example,

??  matrices and data frames are different, but rownames() and row.names() 
do the same on each class.

>
> 	row.names(airquality) <- rep("a", nrow(airquality))
>
> generates an error, but

as does rownames().

>
> 	names(airquality) <- rep("a", ncol(airquality))
>
> or even
>
> 	names(airquality) <- rep("", ncol(airquality))
>
> do not.
>
> I figure that there must be some rationale for this difference, but I can't
> think of what it might be. Any thoughts?

It's part of the definition of a data frame, from long ago (White Book 
p.60).  Think of the row names as a 'primary key' in the sense of a 
DBMS/SQL.

Why the names are not also required to be non-empty and unique 
is something for the designer (and John Chambers has not (yet) replied), 
but it is clearly deliberate as data.frame(check.names=FALSE) is allowed.
One possible issue is that there are many ways to set names of a data 
frame, e.g. DF$name <- value can add a column, and checking them all could 
be tedious.  OTOH, setting row names is centralized (it is done inside
attr<-()).

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list