[Rd] grep, gsub, sub have problems with NA values (PR#3078)

Warnes, Gregory R gregory_r_warnes at groton.pfizer.com
Sat May 24 09:35:49 MEST 2003


Oh dear, more careful checking shows that all elements of a factor get
converted to NA by formatC, but the results retain the factor levels:

>  x <- factor(letters[1:5], width=8)
> formatC(x)
[1] <NA> <NA> <NA> <NA> <NA>
Levels: a b c d e

I have a hard time justifying this behavior.  I expected it to act like
format.char:

> format.char(x,width=8)
[1] "a       " "b       " "c       " "d       " "e       "
Warning message: 
format.char: coercing 'x' to 'character' in: format.char(x, width = 8) 

The way this came up was in formatting all of the elements of a dataframe to
have width 8 so that I could create a fixed width output file...

-G

> -----Original Message-----
> From: Warnes, Gregory R 
> Sent: Saturday, May 24, 2003 8:17 AM
> To: Warnes, Gregory R; 'Thomas Lumley'
> Cc: 'r-devel at stat.math.ethz.ch'
> Subject: RE: [Rd] grep, gsub, sub have problems with NA 
> values (PR#3078)
> 
> 
> 
> I see that this came out garbled.  It should have read:
> 
> FormatC also has problems:  It incorrectly convertys any 
> factor level *containing* the characters 'NA' to a missing value.
> 
> > > formatC(factor("NAME"),width=8)
> > [1] <NA>
> > Levels: NAME
> 
> -G
> 
> 
> > -----Original Message-----
> > From: Warnes, Gregory R 
> > Sent: Friday, May 23, 2003 5:22 PM
> > To: 'Thomas Lumley'; Warnes, Gregory R
> > Cc: r-devel at stat.math.ethz.ch
> > Subject: RE: [Rd] grep, gsub, sub have problems with NA 
> > values (PR#3078)
> > 
> > 
> > 
> > FormatC also has the reverse problem, it detects any factor 
> > contianing the string "NA" and converts it to a factor:
> > 
> > > formatC(factor("NAME"),width=8)
> > [1] <NA>
> > Levels: NAME
> > 
> > -Greg
> > 
> > 
> > 
> > > -----Original Message-----
> > > From: Thomas Lumley [mailto:tlumley at u.washington.edu]
> > > Sent: Friday, May 23, 2003 11:47 AM
> > > To: gregory_r_warnes at groton.pfizer.com
> > > Cc: r-devel at stat.math.ethz.ch
> > > Subject: Re: [Rd] grep, gsub, sub have problems with NA 
> > > values (PR#3078)
> > > 
> > > 
> > > On Thu, 22 May 2003 gregory_r_warnes at groton.pfizer.com wrote:
> > > 
> > > >
> > > > In a string context, grep, gsub, sub are improperly 
> > > treating NA (missing) as
> > > > the string "NA", and returning unexpected results
> > > >
> > > 
> > > as were chartr, abbreviate, substr, substring, strsplit. 
> > > Fixed in r-devel,
> > > for the case of NA in the `main' string. Haven't yet decided 
> > > what to do
> > > about
> > >   grep(as.character(NA), x)
> > > or
> > >   substr(x,1,2)<-as.charcter(NA)
> > > 
> > > 
> > > 
> > > 	-thomas
> > > 
> > 
> 


LEGAL NOTICE\ Unless expressly stated otherwise, this message is... {{dropped}}



More information about the R-devel mailing list