[R] How to see if row names of a dataframe are stored compactly

jim holtman jholtman at gmail.com
Sat Oct 14 04:14:27 CEST 2006


Take a look with 'dput' and you will see the difference:

> row.names(x) <- 1:n
> dput(x)
structure(list(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), V2 = c(11,
12, 13, 14, 15, 16, 17, 18, 19, 20)), .Names = c("V1", "V2"), row.names = c(NA,
10), class = "data.frame")
> row.names(x) <- 2:(n+1)
> dput(x)
structure(list(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), V2 = c(11,
12, 13, 14, 15, 16, 17, 18, 19, 20)), .Names = c("V1", "V2"), row.names = c(2,
3, 4, 5, 6, 7, 8, 9, 10, 11), class = "data.frame")
>

'row.names' is different.

On 10/13/06, Hsiu-Khuern Tang <hsiu-khuern.tang at hp.com> wrote:
> Hi Gabor,
>
> * On Fri 07:59PM, 13 Oct 2006, Gabor Grothendieck (ggrothendieck at gmail.com) wrote:
> > Try this:
> >
> > >class(attributes(x)$row.names)
> > [1] "integer"
> > >rownames(x) <- as.character(rownames(x))
> > >class(attributes(x)$row.names)
> > [1] "character"
>
> Yes, but this doesn't show that row.names was stored as a _single_
> integer (3) instead of a vector of integers (1:3).
>
> Reading the changes again:
>
>    The internal storage of row.names = 1:n just records 'n', for
>            efficiency with very long vectors.
>
>    The "row.names" attribute must be a character or integer
>    vector, and this is now enforced by the C code.
>
> I think row.names is always _printed_ as a vector.  I had misinterpreted the
> help(row.names) paragraph in my original posting to mean that the internal
> storage can be revealed by attributes(x, "row.names").  That paragraph implies
> that attributes(x)$row.names and attr(x, "row.names") can have different
> classes, but I can't create such an example.
>
> I did this experiment:
>
> > n <- 10000
> > x <- as.data.frame(matrix(seq(len=2*n), nrow=n))
> > head(x)
>  V1    V2
> 1  1 10001
> 2  2 10002
> 3  3 10003
> 4  4 10004
> 5  5 10005
> 6  6 10006
> > class(attributes(x)$row.names)
> [1] "integer"
> > save(x, file="x1", compress=FALSE)
> > row.names(x) <- 2:(n+1)
> > class(attributes(x)$row.names)
> [1] "integer"
> > save(x, file="x2", compress=FALSE)
> > subset(file.info(c("x1", "x2")), select=size)
>     size
> x1  80205
> x2 120197
>
> The difference in size is about nrow(x) * 4 bytes.  I think this shows that 1:n
> was stored compactly as a single integer but 2:(n+1) was not.
>
> > On 10/13/06, Hsiu-Khuern Tang <hsiu-khuern.tang at hp.com> wrote:
> > >Reading the list of changes for R version 2.4.0, I was happy to see that
> > >the
> > >row names of dataframes can be stored compactly (as the integer n when
> > >row.names(df) is 1:n).
> > >
> > >help(row.names) contains this paragraph:
> > >
> > >   Row names of the form '1:n' for 'n > 2' are stored internally in a
> > >   compact form, which might be seen by calling 'attributes' but never
> > >   via 'row.names' or 'attr(x, "row.names")'.
> > >
> > >I am unable to get attributes(x)$row.names to return just nrow(x).  Am I
> > >misreading the documentation?  Does "might be seen" mean "possibly in some
> > >future version of R" in this case?
> > >
> > >> (x <- as.data.frame(matrix(1:9, nrow=3)))
> > > V1 V2 V3
> > >1  1  4  7
> > >2  2  5  8
> > >3  3  6  9
> > >> attributes(x)$row.names
> > >[1] 1 2 3
> > >> row.names(x) <- seq(len=nrow(x))
> > >> attributes(x)$row.names
> > >[1] 1 2 3
> > >
> > >Best,
> > >Hsiu-Khuern.
> > >
> > >______________________________________________
> > >R-help at stat.math.ethz.ch mailing list
> > >https://stat.ethz.ch/mailman/listinfo/r-help
> > >PLEASE do read the posting guide
> > >http://www.R-project.org/posting-guide.html
> > >and provide commented, minimal, self-contained, reproducible code.
> > >
>
> Best,
> Hsiu-Khuern.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list