[Rd] Data frames and row names

Tue Aug 15 08:11:32 CEST 2006

On Mon, 14 Aug 2006, Henrik Bengtsson wrote:

> In R-devel v2.4.0 NEWS:
> 
>     o	The 'row.names' of a data frame may be stored internally as an
> 	integer or character vector.  This can result in considerably
> 	more compact storage (and more logical row names from rbind)
> 	when the row.names are 1:nrow(x).  However, such data frames
> 	are not compatible with earlier versions of R: this can be
> 	ensured by supplying a character vector as 'row.names'.
> 
> This is great.
> 
> With row.names == NULL for 1:nrow(x) the storage would be even more
> compact.

A few bytes more compact.  Some day you may get up to the next few lines 
of NEWS which say

	The internal storage of row.names = 1:n just records 'n' for
	efficiency with very long vectors.

(BTW, this is four months' old news, hence my 'some day' comment.)

>  I noticed that the number of rows is inferred from row
> names:
> 
> > dim.data.frame
> function (x)
> c(length(attr(x, "row.names")), length(x))
> <environment: namespace:base>
> 
> but couldn't the number of rows be inferred from the first column, if
> there are no row names?  I realize that this would break the case with
> zero-column data frames, e.g.
> 
> > df <- data.frame(a=1:10)
> > df[,-1]
> NULL data frame with 10 rows.
> 
> ...but maybe there is a way around that too.

Yes, see above.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595