[Rd] Rounding multinomial proportions

Arni Magnusson arnima at hafro.is
Thu Feb 11 11:26:40 CET 2010


Ugh, I made a typo at the very heart of my message:

"when I preprocess each line in R as p<-a/sum(a), occasionally a line will 
sum to 0.999, 1.002, or the like"

should be

"when I preprocess each line in R as p<-round(a/sum(a),3) occasionally a 
line will sum to 0.999, 1.002, or the like"

Also, the first paragraph should end with "where the other multinomial 
functions reside."

Revision 2,

Arni



On Thu, 11 Feb 2010, Arni Magnusson wrote:

> I present you with a function that solves a problem that has bugged me 
> for many years. I think the problem may be general enough to at least 
> consider adding this function, or a revamped version of it, to the 
> 'stats' package, with the other multinomial functions reside.
>
> I'm using R to export data to text files, which are input data for an 
> external model written in C++. Parts of the data are age distributions, 
> in the form of relative frequency in each year:
>
>  Year  Age1   Age2   ...  Age10
>  1980  0.123  0.234  ...  0.001
>  ...   ...    ...    ...  ...
>
> Each row should sum to exactly 1. The problem is that when I preprocess 
> each line in R as p<-a/sum(a), occasionally a line will sum to 0.999, 
> 1.002, or the like. This could either crash the external model or lead 
> to wrong conclusions.
>
> I believe similar partitioning is commonly used in a wide variety of 
> models, making this a general problem for many modellers.
>
> In the past, I have checked every line manually, and then arbitrarily 
> tweaked one or two values up or down to make the row sum to exactly one, 
> but two people would tweak differently. Another semi-solution is to 
> write the values to the text file in a very long format, but this would 
> (1) make it harder to visually check the numbers and (2) the numbers in 
> the article or report would no longer match the data files exactly, so 
> other scientists could not repeat the analysis and get the same results.
>
> Once I implemented a quick and dirty solution, simply setting the last 
> proportion (Age10 above) as 1 minus the sum of ages 1-9. I quickly 
> stopped using that approach when I started seeing negative values.
>
> After this introduction, the attached round_multinom.html should make 
> sense. The algorithm I ended up choosing comes from allocating seats in 
> elections, so I was tempted to provide that application as well, 
> although it makes the interface and documentation slightly more 
> confusing.
>
> The working title of this function was a short and catchy vote(), but I 
> changed it to round_multinom(), even though it's not matrix-oriented 
> like the other *multinom functions. That would probably be 
> straightforward to do, but I'll keep it as a vector function during the 
> initial discussion.
>
> I'm curious to hear your impressions and ideas. In the worst case, this 
> is a not-so-great solution to a marginal problem. In the best case, this 
> might be worth a short note in the Journal of Statistical Software.
>
> Thanks for your time,
>
> Arni
>
> P.S. In case the mailing list doesn't handle attachments, I've placed 
> the same files on http://www.hafro.is/~arnima/ for your convenience.
>



More information about the R-devel mailing list