[R] What distribution is related to hypergeometric?

Ted Byers r.ted.byers at gmail.com
Thu Sep 25 16:45:51 CEST 2008


I have been reading, in various sources, that a poisson distribution is
related to binomial, extending the idea to include numbers of events in a
given period of time.

In my case, the hypergeometric distribution seems more appropriate, but I
need a temporal dimension to the distribution.

I have weekly samples of two kinds of events: call them A and B.  I have a
count of A events.  These change dramatically from one week to the next.  I
also have weekly counts of B events that I can relate to A events.  Some
fraction 'lambda' (between 1 and 1) of A events will result in B events some
time in the future (but also sometimes in the same week that the related A
event occured).  The B event related to a given A event can occur as much as
ten weeks after the A event.  B events can not occur without a prior A
event, and well over half of the A events will never produce a B event. 
Also, we know that a given A event can not produce more than one B event. 
Hence hypergeometric is much more appropriate than binomial, and thus my
need for the distribution that has the same relation to the hypergeometric
that the poisson has to binomial.  Since hypergeometric is related to
binomial, would poisson also be related to hypergeometric?

My data is best expressed as a fraction: number of B events in a given week
divided by the number of A events producing the B events.  I.e. if there are
500 A events in week n, the data would be the number of related B events in
week m (m >= n) divided by 500. and the first table I get from the DB has
records containing an ordered pair: week number, fraction.  E.g.

0,0.2
1,0.3
2,0.25
3,0.2
...

The above is dummy data, but the pattern I see in the data is that the
number of B events in week 0 is less than the number of B events in week 1,
but from then on, the number of B events declines exponentially (as you'd
expect from what could be described as a decay process, altered to reflect
the fact that over half of the original A events will never produce B
events).  Of all the distributions I tried on this data, exponential and
poisson produced the best fits, with very little to choose between them.

Always, the cumulative fraction of A events that have produced B events
approaches an asymptote between 0.25 and 0.45.  Never higher, but now it
looks like the asymptotes are getting smaller (the behaviour of the system
is changing).

In a sense, this breaks down into two questions:  
1) What distribution should I try to fit to my data?  
2) How do I present my data to the functions that will try to fit the
distribution to this data?

The reason for the second is that, while I have examined lots of functions
(fBasics, MASS, &c.) that will try to fit a distribution to data, they all
seem to expect a 1D vector of data and none of them say anything about the
data, or what to do if you already have an empirical (cumulative)
distribution.  

To try out the functions that fit distributions, I created a dummy vector
where the initial sample size was 1000, and the number of values equal to a
given week number would be 1000 * the faction of A events that produced B
events.  E.g. (using the sample numbers above, there'd be 200 '0's, 300
'1's, 250 '2's, &c.)

Thanks

Ted
-- 
View this message in context: http://www.nabble.com/What-distribution-is-related-to-hypergeometric--tp19671054p19671054.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list