[R] non-linear binning? power-law in R

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Wed Jun 16 14:21:47 CEST 2004


On Wed, 16 Jun 2004, Sean Davis wrote:

>Is ?cut what you need?

This is giving the cleanest results yet.

Cheers,
Dan.


>
>Sean
>
>
>On 6/16/04 6:52 AM, "Dan Bolser" <dmb at mrc-dunn.cam.ac.uk> wrote:
>
>> 
>> First, thanks to everyone who helped me get to grips with R in (x)emacs
>> (I get confused easily). Special thanks to Stephen Eglen for continued
>> support.
>> 
>> My question is about non-linear binning, or density functions over
>> distributions governed by a power law ...
>> 
>> y ~ mu*x**lambda    # In one of its forms
>>                       # (can't find Pareto in the online help)
>> 
>> Looking at the following should show my problem....
>> 
>> x3 <- runif(10000)**3    # Probably a better (correct) way to do this
>> 
>> plot( density(x3,cut=0,bw=0.1))
>> plot( density(x3,cut=0,bw=0.01))
>> plot( density(x3,cut=0,bw=0.001))
>> 
>> plot(density(x3,cut=0,bw=0.1),  log='xy')
>> plot(density(x3,cut=0,bw=0.01), log='xy')
>> plot(density(x3,cut=0,bw=0.001),log='xy')
>> 
>> The upper three plots show that the bw has a big effect on the appearance
>> of the graph by rescaling based on the initial density at low values of x,
>> which is very high.
>> 
>> The lower plots show (I think) an error in the use of linear bins to view
>> a non linear trend. I would expect this curve to be linear on log-log
>> scales (from experience), and you can see the expected behavior in the
>> tails of these plots.
>> 
>> If you play with drawing these curves on top of each other they look OK
>> apart from at the beginning. However, changing the band width to 0.0001 has
>> a radical effect on these plots, and they begin to show a different trend
>> (look like they are being governed by a different power).
>> 
>> Hmmm....
>> 
>> x3log <- -log(x3)
>> 
>> plot( density(x3log,cut=0,bw=0.5),  log='y',col=1)
>> 
>> lines(density(x3log,cut=0,bw=0.2),  log='y',col=2)
>> lines(density(x3log,cut=0,bw=0.1),  log='y',col=3)
>> lines(density(x3log,cut=0,bw=0.01), log='y',col=4)
>> 
>> Sorry...
>> 
>> 
>> 'Real' data of this form is usually discrete, with the value of 1 being
>> the most frequent (minimum) event, and higher values occurring less
>> frequently according to a power (power-law). This data can be easily
>> grouped into discrete bins, and frequency plotted on log scales. The
>> continuous data generated above requires some form of density estimation
>> or rescaling into discreet values (make the smallest value equal to 1 and
>> round everything else into an integer).
>> 
>> I see the aggregate function, but which function lets me simply count the
>> number of values in a class (integer bin)?
>> 
>> The analysis of even the discretized data is made more accurate by the use
>> of exponentially growing bins. This way you don't need to plot the data on
>> log scales, and the increasing variance associated with lower probability
>> events is handled by the increasing bin size (giving good accuracy of
>> power fitting). How can I easily (ignorantly) implement exponentially
>> increasing bin sizes?
>> 
>> Thanks for any feedback,
>> 
>> Dan.
>> 
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>> 
>




More information about the R-help mailing list