[R] How can I use IPF function correctly?
David L Carlson
dcarlson at tamu.edu
Mon Jul 30 19:34:03 CEST 2012
Im not sure what SAS is doing (or if you are using it correctly). In R you
do not create marginal totals independent of the data and try to fit them to
the data. In your first example you create a matrix called raw, but you do
not use it for anything. Your loglin() call is for the all cells 16.67 and
then you fit a model in which the row and column marginal totals are used
but not the row*column marginal. Im not really sure what you are trying to
accomplish with that.
In your second example you create three variables and then want to fit
another set of marginal totals that seem to be roughly equal distribution
for rows/columns/pages except that race has four categories but tart.reg has
If the null hypothesis for these data is no interaction between the
variables and that each category should have the same proportion of cases:
age=c(1/3, 1/3, 1/3), gender=c(1/2, 1/2), race=c(1/4, 1/4, 1/4, 1/4), then
mytable <- xtabs(~age+gender+race, rawdat) # table() loses the variable
loglin(mytable, margin=list(0), fit=TRUE)
If you want to preserve the marginal totals for each variable, but not any
interactions between them use
loglin(mytable, margin=list(1, 2, 3), fit=TRUE)
If you want to fit the three two-way interactions use
loglin(mytable, margin=list(c(1, 2), c(2, 3), c(1, 3)), fit=TRUE)
If you want to fit the saturated table (all interactions), use
loglin(mytable, margin=list(c(1, 2, 3)), fit=TRUE)
From: Miao Zhang [mailto:mandyzhangpublic at gmail.com]
Sent: Monday, July 30, 2012 9:35 AM
To: dcarlson at tamu.edu
Subject: Re: [R] How can I use IPF function correctly?
The purpose of doing this is that i am trying to weighted the data to get
the target values (yes, I am using percentage instead of counts here), I
could get what I need for 2 way tables as using loglin() codes as below, I
have the row target and column target value:
newmat1 <- loglin( rowmarg%o%colmarg/sum(colmarg), margin=list(1,2),
start=raw, fit=TRUE, eps=1.e-05, iter=100)$fit
Am I am not sure how to expending into 3 or higher dimensions(I need
expending into higher dimentions latter), that's why I am considering
Iterative proportional fitting/ipf(), SAS can use ipf call, but i am not
sure how to apply in R, here we could use counts instead of %, here is an
example, say we have age, gender and region 3 variables, by using frequency:
### set a rawdata and view the frequency####
mytable <- table (rawdat$age,rawdat$gender,rawdat$race) #generates a
cross-tab of counts
### set target value to weight the frequency 3 dimensions, NOTE, we are
using counts here not percentage, trying to fit the frequency to the target
f2<-ipf(mytable, margins=c(1,2,0,1,3,0,2,3), eps = 1e-04, maxits = 50,
showits = TRUE) #no 3 way interaction
Where and how should I input/set my target value here? Any sugguestions? or
I have to write my own function?
On Fri, Jul 27, 2012 at 5:11 PM, David L Carlson <dcarlson at tamu.edu> wrote:
It is not clear what you are trying to do. The ipf() function you are using
seems to be the one included in package cat for imputing missing values for
categorical variables. For ipf() you have not read the instructions
carefully because you have entered the marginal values, not their dimensions
and you have given ipf() a 2 way table but miss-specified a three way model.
No wonder it is confused. Function loglin() which is part of the included
stats package also does iterative proportional fitting.
Iterative proportional fitting (ipf) is used for fitting models for
categorical data when there are three or more variables. There is no need
for ipf on a table with two variables since, the values can be directly
Your example data does not include the raw data counts (as it should), but
percentages for each of the 3 x 2 cells (I assume, since they sum to 100).
The marginal values you list (again percentages) are for a model assuming
equal margins. That is easily computed as 1/3*1/2*100 (one third in each row
by one half in each column times 100). So each cell should be 16.667 percent
of the total. Using loglin() that would be specified as follows:
> loglin(raw, margin=list(0), fit=TRUE)
0 iterations: deviation
[1,] 16.66667 16.66667
[2,] 16.66667 16.66667
[3,] 16.66667 16.66667
The lrt and pearson statistics are not valid because you are not using
original counts. Note that the number of iterations is 0 because in a 2 way
model the values are directly computed.
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Miao Zhang
> Sent: Friday, July 27, 2012 6:52 AM
> To: r-help at r-project.org
> Subject: [R] How can I use IPF function correctly?
> Hi All,
> I am trying to creat a simple example byusing ipf function in R, but i
> could not get it succefully...I am very new to R, does anyone could
> to instruct me about this ipf fucntion?
> Actually, this is what I mean
> 50 | 50
> 33.4| 28.57 | 14.29
> 33.3| 23.81 | 4.762
> 33.3| 9.523 | 19.05
> A 3*2 matrix
> raw<-matrix(c(28.571,14.286,23.809,4.762,9.523,19.049),3, 2,byrow=TRUE)
> the sum of margin (the value I am setting as the target)
> then call ipf function:
> fit1<-ipf(table, margins=m,start=raw,eps = 1e-04, maxits = 50, showits
> I could calculate it by hand with 7 iterations, but end by I am hoping
> get R build in ipf function to get it done, what should I put "table"
> Thanks in advance!
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help