[R] Estimating Mean of a Poisson Distribution Based on interval censoring

mohsen hs mohsenhs82 at yahoo.com
Tue Feb 16 06:14:12 CET 2016


Hi Terry,
Thank you for your reply and your time. I really appreciate your time and I know that you should be very busy.
Please forgive me for any possible technical mistake as I am not a professional in statistic.
R has a package with the name of fitdist that gave me the estimation. It seems that lifereg does not support Poisson at list without any specific modification,
(https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_lifereg_sect019.htm).
Please let me explain my problem which you can find below. Since, you are interested to know the problem, I tried to explain the problem clearly. Please forgive me if it is too long:( . I really appreciate your time, consideration, and feedback.


Many thanks,Mohsen
We collected the data from some devices and they are able to count the data within a minute. For example, we had 10 observations in 12:01, 15 in 12:02, and so forth. I randomly distributed the data (10 obs) in the first 60 sec, and the 15 obs in (60 to 120 sec) and count the interarrival of obs. In this case, I got a result like this:1, 2002, 4003, 150The first row means I had 200 objects that arrived with the difference of 1 second, 400 objects with the difference of two seconds (interarrival rate was 2 sec for 400 objects), and so on. To do that, I wrote a function to repeat 1, two hundred times, repeat 2, four hundred times, and ... Now, I would like to investigate which distribution they follow. Since some of the devices were faulty and  for some other reasons such as tails of QQ plot, I decided to censor some of the data, and those are interarrival(obs)<3 sec and interarrival(obs)>500 sec. 

On a different topic related to difference between R and SAS, I have provided you with the mu, sigma, and codes.

In SAS and R, I used interval censoring with the boundaries of 3, and 500 and formed a table like this:

        count & t1  & t2           1     & 1   & 3             1     & 1   & 3             2     & 2   & 3             3     & 3   & 3            3     & 3   & 3             4     & 4   & 4             ..    & ..  & ..            500   & 500 & 500          501   & 500 & 501           ..    & ..  & ..           39011 & 500 & 39011

count is the observed value and is the response variable, t1 is the bound of right censoring and t2 is the bound of left censoring. This is done by the following command in SAS
 loss count/ lc=t2 rc=t1;

SAS estimated:
 $\mu=3.67361$ , $\sigma=1.45265$ http://support.sas.com/documentation/cdl/en/etsug/63939/HTML/default/viewer.htm#etsug_severity_sect037.htm      The LOSS statement specifies the left and right boundary of each group as the       right-censoring and left-censoring limits, respectively.       lc=t2 rc=t1;      t1   t2      1    3      1    3      500  600      500  4500
SAS code,/////////////////////////////
                 data df;                                                                                                                                    infile "/home/Username/PathAll_TWOMONTH _BothDirection715_2.csv"                                                                              LRECL=10000000  DLM=',' firstobs=2;                                                                                                    input                                                                                                                                  timenumn count right left                                                                                                              ;                                                                                                                                      run;       data t1(drop=count left right rename=(timenumn=t1)) ;                                                                                                                                 set df;                                                                                                                                      do i = 1 to count;                                                                                                                           output;                                                                                                                                      end;                                                                                                                                         run;           data t2(drop=count left right rename=(timenumn=t2));                                                                                                                                      set df;                                                                                                                                      do i = 1 to count;                                                                                                                           output;                                                                                                                                      end;                                                                                                                                         run;           data count(drop=count left right rename=(timenumn=count)) ;                                                                                                                                     set df;                                                                                                                                      do i = 1 to count;                                                                                                                           output;                                                                                                                                      end;                                                                                                                                         run;                     data t1;     set t1;     /*If t1 < 3 then t1 = 1;     if t1>1000 then t1=1000;*/     if (t1<3 ) then t1=1;     if(t1>2) then t1=t1;     if(t1>500) then t1=500;     run;          data t2;     set t2;     /*If t2 < 3 then t2 = 5;     if t2 >1000 then t2=t2;*/     if(t2<3) then t2=3;     /*if(t2<501) then t2=t2;     if (t2>500 ) then t2=500;*/     run;     ;     /*     data count;     set count;     if count <100005 then count=1     run;     ;     */     data final;     merge t1 t2 count;     run;          proc severity data=final print=all plots(histogram kernel)=all     criterion=ad;     loss count/ lc=t2 rc=t1;           dist logn; /* _predef_; */     run;

| Parameter Estimates |
| Parameter | Estimate | Standard
Error | t Value | Approx
Pr > |t| |
| Mu | 3.67361 | 0.00859 | 427.50 | <.0001 |
| Sigma | 1.45265 | 0.00615 | 236.07 | <.0001 |

////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
I decided to compare the $\mu$ and $\sigma$ values from R and SAS. I got the following values from R which is slightly different:
$\mu=3.653001$$\sigma=1.497570$
////////R codelibrary(fitdistrplus)df= read.csv ("E:/Motorway-Urban/hour/PathAll_TWOMONTH _BothDirection715_2.csv")z=rep(df$timenum,time=df$count)y<-zycens <- data.frame(left=y,right=y)max=27219ct=maxfor(i in max:28666 ){    ycens$right[ct]=y[ct]    ycens$left[ct]=500    ct=ct+1}
ct=1;for(i in 1:28666 ){        if( ycens$left[i]<3)    {        ycens$left[ct]=1            }    ct=ct+1}fitlnc<-fitdistcens(ycens,"lnorm")Fitting of the distribution ' lnorm ' on censored data by maximum likelihood Parameters:        estimatemeanlog 3.653001sdlog   1.497570
   

    On Tuesday, February 16, 2016 1:38 AM, "Therneau, Terry M., Ph.D." <therneau at mayo.edu> wrote:


 For an interval censored poisson or lognormal, use survreg() in the survival package.  (Or 
if you are a SAS fan use proc lifereg).  If you have a data set where R and SAS give 
different answers I'd like to know about it, but my general experience is that this is 
more often a user error.  I am also curious to learn exactly what you mean by "interval 
censored poisson".  Exponential with interval time to first event is equivalent to 
poisson, which is what I'd guess from "lognormal", but you may mean something else.


Terry Therneau
(author of survival)



On 02/14/2016 05:00 AM, r-help-request at r-project.org wrote:
> Dear all,
> I appreciate that if you let me know if there is any package implemented in R for Estimating Mean of a Poisson Distribution Based on Interval censoring? And if yes, could you please provide some information about it:)?
> By the way, is there anything for lognormal?I think fitdistcens is not good for this purpose as it gives me different result compared to SAS and only useful for right/left censoring and not interval censoring?(or both left and right together).?
> Kind regards,Mohsen


  
	[[alternative HTML version deleted]]



More information about the R-help mailing list