[R] format.data.frame and NA control

Peter Dalgaard p.dalgaard at biostat.ku.dk
Mon Oct 22 20:18:29 CEST 2007


Sebastian P. Luque wrote:
> Hi,
>
> Is there a more efficient way to output NA strings as empty strings in
> format.data.frame than this:
>
> ---<---------------cut here---------------start-------------->---
> R> tt <- data.frame(a=c(NA, rnorm(8), NA), b=c(NA, letters[1:8], NA))
> R> tt <- format(tt, digits=5, trim=TRUE)
> R> tt
>            a  b
> 1         NA NA
> 2   2.012460  a
> 3   0.364181  b
> 4   1.398317  c
> 5   0.730969  d
> 6  -1.321741  e
> 7   0.081472  f
> 8   2.019201  g
> 9   0.090003  h
> 10        NA NA
> R> as.data.frame(lapply(tt, function(x) {x[x == "NA"] <- ""; x}))
>            a b
> 1             
> 2   2.012460 a
> 3   0.364181 b
> 4   1.398317 c
> 5   0.730969 d
> 6  -1.321741 e
> 7   0.081472 f
> 8   2.019201 g
> 9   0.090003 h
> 10            
> ---<---------------cut here---------------end---------------->---
>
> Thanks.
>   
I suspect that there's a bug lurking in here. I get

 > format(c(1,NA),na.encode=TRUE)
[1] " 1" "NA"
 > format(c(1,NA),na.encode=FALSE)
[1] " 1" "NA"

I.e., they give the same thing, where I would expect that the latter gave

 > c("1",NA)
[1] "1" NA

The point is that if NA had been passed through like that, then you 
might simply have used print(tt, na.print="", ...) but as it is:
 > print(tt, na.print="", digits=5)
          a b
1        NA 
2   0.60110 a
3   0.40988 b
4  -1.45437 c
5   1.58159 d
6   0.52801 e
7  -0.52988 f
8  -1.63540 g
9  -0.38973 h
10       NA 

... it  only works on character columns, not the numeric ones.

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-help mailing list