[R] Re: [Rd] corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits)

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Feb 14 14:23:48 CET 2005

On Mon, 14 Feb 2005, Gregor GORJANC wrote:

> Sending this also to r-help so anyone can read it also there and maybe also 
> help me with my puzzle if this trivial and I don't see it.

Please don't, and especially do not after having removed the context.
So I have removed R-help from the follow-up.

> Prof Brian Ripley wrote:
> [... removed some ...]

The question I answered has been removed here, which is discourteous both 
to your helper and to your readers.

>> You add a column, not replace part of a non-existent column.  Isn't that 
>> obvious, given what you wrote?

Not if you subsequently remove what you wrote, of course.

> # OK. If I do
> tmp <- data.frame(y1=1:4, f1=factor(c("A", "B", "C", "D")))
> tmp[1:2, "y2"] <- 2
> tmp
> # I am changing nonexistent column y2 in data frame tmp.
> # If I do
> tmp <- data.frame(y1=1:4, f1=factor(c("A", "B", "C", "D")))
> tmp$y2 <- NA
> tmp[1:2, "y2"] <- 2
> tmp
> # I am changing existent column. I understand now the difference. However,
> # it is weird for me that this is OK (if column y2 does not yet exist)
> tmp["y2"] <- 2
> # but this is not
> tmp[1:2, "y2"] <- 2

What is `wierd' is your insistence that this makes sense.  Columns in a 
data frame are required to be the same length.  How is that supposed to be 
made up to the correct length?  Possible for a numeric column with NAs, 
but not sensible for a raw column or a data frame column or ....

>> There is a lot of basic documentation on data manipulation in R/S, and a 
>> whole chapter in MASS4.  Somehow most other people don't seem to find this 
>> a problem.
> I just ordered MASS4 last week and I am eager to get it in my hands. In 
> meanwhile I read quite some documentation and what I more or less saw is
> tmp <- data.frame(y1=1:4, f1=factor(c("A", "B", "C", "D")))
> tmp$y2 <- 1:4
> tmp$y3 <- 2*tmp$y1
> ...
> ...
> i.e. everybody is adding full column to data frame. But I would like to add 
> just one part.

But you cannot do so and not get a corrupt data frame. All you can hope 
for is to add a column and for something arbitrary to be added to your 
input to do so.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

More information about the R-devel mailing list