[R] MASS fitdistr with plyr or data.table?

Dennis Murphy djmuser at gmail.com
Thu Apr 28 01:31:22 CEST 2011


Hi:

Here's one way to do this with plyr and data.table.

# plyr
As Hadley inferred, when using ddply(), it's convenient to write a
function for a generic (sub-)data frame and have it return a data
frame. Here's my function and call:

f <- function(d) {
     require(MASS)
     est <- fitdistr(d$wind_speed, 'weibull')$estimate
     data.frame(shape = est[1], scale = est[2])
  }

Notice that f() takes a data frame d as input, uses the wind_speed
component of d to fit the Weibull, and returns a data frame with names
for the estimates.

> ddply(weib.test.too, 'site', f)
   site    shape    scale
1     1 3.063853 8.049467
2     2 2.945982 8.067252
3     3 2.879392 7.999636
4     4 3.097191 8.084453
5     5 3.091117 8.012450
6     6 2.943254 7.912792
7     7 2.957455 7.947545
8     8 2.975732 7.901587
9     9 3.045563 8.061838
10   10 2.995324 8.056820
Warning messages:
1: In dweibull(x, shape, scale, log) : NaNs produced
2: In dweibull(x, shape, scale, log) : NaNs produced
<snip the other eight>

## data.table

In writing a function for data.table, you want a variable as input and
a list as output. We therefore modify the above function slightly:

g <- function(x) {
     require(MASS)
     est <- fitdistr(x, 'weibull')$estimate
     list(shape = est[1], scale = est[2])
  }

library(data.table)
weib.test.dt <- data.table(weib.test.too, key = 'site')
> weib.test.dt[, g(wind_speed), by = site]
      site    shape    scale
 [1,]    1 3.063853 8.049467
 [2,]    2 2.945982 8.067252
 [3,]    3 2.879392 7.999636
 [4,]    4 3.097191 8.084453
 [5,]    5 3.091117 8.012450
 [6,]    6 2.943254 7.912792
 [7,]    7 2.957455 7.947545
 [8,]    8 2.975732 7.901587
 [9,]    9 3.045563 8.061838
[10,]   10 2.995324 8.056820
## <warnings snipped>

HTH,
Dennis


On Wed, Apr 27, 2011 at 1:55 PM, Justin Haynes <jtor14 at gmail.com> wrote:
> I am trying to extract the shape and scale parameters of a wind speed
> distribution for different sites.  I can do this in a clunky way, but
> I was hoping to find a way using data.table or plyr.  However, when I
> try I am met with the following:
>
> set.seed(144)
> weib.dist<-rweibull(10000,shape=3,scale=8)
> weib.test<-data.table(cbind(1:10,weib.dist))
> names(weib.test)<-c('site','wind_speed')
>
> fitted<-weib.test[,fitdistr(wind_speed,'weibull'),by=site]
>
> Error in class(ans[[length(byval) + jj]]) = class(testj[[jj]]) :
>  invalid to set the class to matrix unless the dimension attribute is
> of length 2 (was 0)
> In addition: Warning messages:
> 1: In dweibull(x, shape, scale, log) : NaNs produced
> ...
> 10: In dweibull(x, shape, scale, log) : NaNs produced
>
> (the warning messages are normal from what I can tell)
>
> or using plyr:
>
> set.seed(144)
> weib.dist<-rweibull(10000,shape=3,scale=8)
> weib.test.too<-data.frame(cbind(1:10,weib.dist))
> names(weib.test.too)<-c('site','wind_speed')
>
> fitted<-ddply(weib.test.too,.(site),fitdistr,'weibull')
>
> Error in .fun(piece, ...) : 'x' must be a non-empty numeric vector
>
> those sound like similar errors to me, but I can't figure out how to
> make them go away!
>
> to prove I'm not crazy:
>
> fitdistr(weib.dist,'weibull')$estimate
>   shape    scale
> 2.996815 8.009757
> Warning messages:
> 1: In dweibull(x, shape, scale, log) : NaNs produced
> 2: In dweibull(x, shape, scale, log) : NaNs produced
> 3: In dweibull(x, shape, scale, log) : NaNs produced
> 4: In dweibull(x, shape, scale, log) : NaNs produced
>
> Thanks
>
> Justin
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list