[R] Unexpected behaviour of plyr::ddply

Mon Sep 29 09:24:47 CEST 2014

On Wed, 17-Sep-2014 at 12:36AM -0300, walmes . wrote:

|> Hello R users,
|> 
|> I'm writing a brief tutorial of getting statistical measures by splitting
|> according strata and over columns. When I used plyr::ddply I got and
|> unexpected result, with NA/NaN for non existing cells. Below is a minimal
|> reproducible code with the result that I got. For comparison, the result of
|> aggregate is showed. Why this behaviour? What I can do to avoid it?
|> 
|> > require(plyr)
|> >
|> > hab <-
|> +     read.table("http://www.leg.ufpr.br/~walmes/data/ipea_habitacao.csv",
|> +                header=TRUE, sep=",", stringsAsFactors=FALSE, quote="",
|> +                encoding="utf-8")
|> >
|> > hab <- hab[,-ncol(hab)]
|> > names(hab) <- c("sig", "cod", "mun", "agua", "ener", "tel", "carro",
|> +                 "comp", "tot")
|> > hab <- transform(hab, sig=factor(sig))
|> > hab$siz <- cut(hab$tot, breaks=c(-Inf, 5000, Inf),
|> +                labels=c("P","G"))

However:
> summary(hab$tot)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
    227    1328    2640    8264    5440 3039000      89 

Those NAs interfere with the cut() statement.

The simplest work around is

> hab <- na.omit(hab)
> 
Then ddply will play nicely.

HTH

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___    Patrick Connolly   
 {~._.~}                   Great minds discuss ideas    
 _( Y )_  	         Average minds discuss events 
(:_~*~_:)                  Small minds discuss people  
 (_)-(_)  	                      ..... Eleanor Roosevelt

~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.