[R] length() misbehaving?

Marc Schwartz mschwartz at medanalytics.com
Fri Mar 14 17:22:48 CET 2003


>-----Original Message-----
>From: r-help-bounces at stat.math.ethz.ch 
>[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of David
Parkhurst
>Sent: Friday, March 14, 2003 9:35 AM
>To: r-help at stat.math.ethz.ch
>Subject: [R] length() misbehaving?
>
>
>I'm having a weird problem with length(), in R1.6.1 under 
>windows2000.  I have a dataframe called byyr, with ten 
>columns, the first of which is named cnd95.
>summary(byyr) shows that byyr$cnd95 contains the factor level 
>"tr" 66 times.  Also, when I enter byyr$cnd95 at the command 
>line, I can count 66 "tr" elements in the resulting vector.  
>However, when I enter
>
>n95trt <- length(byyr$cnd95[byyr$cnd95=="tr"])
>n95trt
>
>the result is 68!  Any ideas why this is happening, and how I 
>can fix the miscount? (That column also contains 69 entries of 
>"c", and (relevantly?) two NA's.)
>
>Thanks for any help.
>
>Dave Parkhurst


It is expected.

Since NA represents a true unknown, the two NA's in your vector 'may
be' a "tr".  Thus, you get TRUE for the NA's when making the
comparison.

Instead of length(), you might want to use:

sum(byyr$cnd95[byyr$cnd95 == "tr"], na.rm = TRUE)

which will remove the two NA's.

See ?sum

HTH,

Marc Schwartz



More information about the R-help mailing list