[R] length, mean, na.rm, na.omit...

Duncan Murdoch murdoch at stats.uwo.ca
Fri May 18 17:10:00 CEST 2007


On 5/18/2007 10:32 AM, Muenchen, Robert A (Bob) wrote:
> Hi All,
> 
> Can anyone tell me why the length function does not use na.rm? I know
> how to work around it, I'm just curious to know why such a useful option
> was left out.

length() is used very frequently in other functions, so it is encoded as 
a primitive for speed.  Adding an optional argument to it would slow it 
  down.

> I'm also interested in the logic of setting na.rm=TRUE as the default on
> mean, sd, etc. This is the opposite of the many other stat packages I
> have used, so I assume it provides some programming benefit that is not
> obvious to me.

That's also the opposite of what R does.  Did you mean to ask why 
na.rm=FALSE is the default?  I think it follows from thinking of NA as 
meaning "not known", rather than "missing at random".  If you don't know 
why values are missing, you may get biased results by calculating the 
mean of the others:  and R would rather not give you biased results.

Duncan Murdoch



More information about the R-help mailing list