[R] question about result of loglinear analysis

Wed Jan 19 13:37:08 CET 2011

> Date: Wed, 19 Jan 2011 01:20:06 -0800
> From: djmuser at gmail.com
> To: laomeng.3 at gmail.com
> CC: r-help at r-project.org
> Subject: Re: [R] question about result of loglinear analysis
>
> Hi:
>
> Well, you fit a saturated model. How many degrees of freedom do you have
> left for error? The fact that the standard errors are so huge relative to
> the estimates is a clue.
>
> Taking a look at your data, it's pretty clear that nation 3 is an
> outstanding outlier on its own. It is clearly - nay, blatantly - different
> from the other nations in the sample. Look at
>
> boxplot(fre ~ nation, data = data_Analysis)
> boxplot(sqrt(fre) ~ nation, data = data_Analysis)

I'm scrolling back though my cygwin windoh, last night I used this,
( read data into "x" not data_Analysis)

> x<-read.table("area_nation.txt",header=TRUE)
> str(x)
'data.frame':   77 obs. of  3 variables:
 $ area  : int  1 1 1 1 1 1 1 1 1 1 ...
 $ nation: int  1 2 3 4 5 6 7 8 9 10 ...
 $ fre   : int  0 0 85 2 0 0 0 0 1 0 ...
> library(scatterplot3d)
> library(rgl)
> scatterplot3d(x$area,x$nation,x$fre,type="h")
> scatterplot3d(x$area,x$nation,log(x$fre+1),type="h")

there is always a discussion here on "looking at pictures" and post hoc
analysis or what is legitimate to do with outliers that may be confusing to
some readers but you always need to keep in mind your overall objectives here.
It often helps to forget for a minute that you are doing something intellectual
or pompous and just stare at the pictures ( or someone else quoted a statistician
talking about getting rat dropping under your finger nails presumably meaning
getting more familiar with details of your data aqusition system LOL). 

>
> the latter to deal with the huge outlier near 1200 in the original data.
> Even on the square root scale, nation 3 sticks out like a sore thumb. 43/77
> of your responses have zero frequency, so you should probably be looking
> into zero-inflated Poisson models and some of its relatives. Here is one
> citation to get you started:
>
> http://www.jstatsoft.org/v27/i08/paper
>
> Package VGAM also has functionality to fit these types of models.
>
> Using package sos, I typed
>
> # Install package sos first if you don't have it:
> library(sos)
> findFn('zero Poisson')
>
> which found 255 matches; you should find several packages that pertain to
> zero-inflated/zero-altered Poisson models.
>
> In the absence of the scientific background behind the data, the dominance
> of nation 3 may well mask more subtle effects among the other nations, so
> you might want to consider analyses with and without nation 3.
>
> HTH,
> Dennis
>
> On Tue, Jan 18, 2011 at 5:45 PM, Lao Meng  wrote:
>
> > Hi all:
> > Here's a question about result of loglinear analysis.
> > There're 2 factors:area and nation.The raw data is in the attachment.
> >
> > I fit the saturated model of loglinear with the command:
> > glm_sat<-glm(fre~area*nation, family=poisson, data=data_Analysis)
> >
> > After that,I extract the coefficients:
> > result_sat<-summary(glm_sat)
> > result_coe<-result_sat$coefficients
> >
> > I find that all the coeffients are 1 or very near to 1.
> >
> > How does this happen?Why all the coeffients are 1 or very near to 1?
> >
> > Thanks!
> >
> > My best
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.