[R] mixtures as outcome variables

Kjetil Brinchmann Halvorsen kjetil at acelerate.com
Wed Mar 23 17:36:41 CET 2005


Jason W. Martinez wrote:

>Dear R-users,
>
>I have an outcome variable and I'm unsure about how to treat it. Any
>advice?
>
>I have spending data for each county in the state of California (N=58).
>Each county has been allocated money to spend on any one of the
>following four categories: A, B, C, and D.
>
>Each county may spend the money in any way they see fit. This also means
>that the county need not spend all the money that was allocated to them.
>The data structure looks something like the one below:
>
>COUNTY    A        B       C       D        Total
>----------------------------------------------------
>alameda  2534221  1555592 2835475  3063249  9988537
>alpine   3174     8500    0        45558    55232
>amador    0       0        0        0       0
>....
>
>
>The goal is to explain variation in spending patterns, which are
>presumably the result of characteristics for each county.
>
>I may treat the problem like a simple linear regression problem for each
>category, but by definition, money spent in one category will take away
>the amount of money that can be spent in any other category---and each
>county is not allocated the same amount of money to spend.
>
>I have constructed proportions of amount spent on each category and have
>conducted quasibinomial regression, on each dependent outcome but that
>does not seem very convincing to me. 
>
>Would anyone have any advice about how to treat an outcome variable of
>this sort?
>
>Thanks for any hints!
>
>Jason
>
>
>
>
>
>  
>
If you only concentrate on the relative proportions, this are called 
compositional data. I f your data are in
mydata (n x 4), you obtain compositions by
sweep(mydata, 1, apply(mydata, 1, sum), "/")

There are not (AFAIK) specific functions/packages for R for 
compositional data AFAIK, but you
can try googling. Aitchison  has a monography (Chapman & Hall) and a 
paper in JRSS B.

One way to start might be lm's or anova on the symmetric logratio 
transform of the
compositons. The R function lm can take a multivariate response, but 
some extra programming will be needed
for interpretation. With simulated data:

 > slr
function(y) { # y should sum to 1
          v <- log(y)
          return( v - mean(v) ) }
 > testdata <- matrix( rgamma(120, 2,3), 30, 4)
 > str(testdata)
 num [1:30, 1:4] 0.200 0.414 0.311 2.145 0.233 ...
 > comp <- sweep(testdata, 1, apply(testdata,1,sum), "/")
# To get the symmetric logratio transform:
comp <- t(apply(comp, 1, slr))
# Observe:
apply(cov(comp), 1, sum)
[1] -5.551115e-17  2.775558e-17  5.551115e-17 -2.775558e-17
 > lm( comp ~ 1)

Call:
lm(formula = comp ~ 1)

Coefficients:
             [,1]      [,2]      [,3]      [,4]   
(Intercept)   0.17606   0.06165  -0.03783  -0.19988

 > summary(lm( comp ~ 1))
Response Y1 :

Call:
lm(formula = Y1 ~ 1)

Residuals:
     Min       1Q   Median       3Q      Max
-1.29004 -0.46725 -0.07657  0.55834  1.20551

Coefficients:
     Estimate Std. Error t value Pr(>|t|)
[1,]   0.1761     0.1265   1.391    0.175

Residual standard error: 0.6931 on 29 degrees of freedom


Response Y2 :

Call:
lm(formula = Y2 ~ 1)

Residuals:
    Min      1Q  Median      3Q     Max
-1.2982 -0.5711 -0.1355  0.5424  1.6598

Coefficients:
     Estimate Std. Error t value Pr(>|t|)
[1,]  0.06165    0.15049    0.41    0.685

Residual standard error: 0.8242 on 29 degrees of freedom


Response Y3 :

Call:
lm(formula = Y3 ~ 1)

Residuals:
     Min       1Q   Median       3Q      Max
-1.97529 -0.41115  0.03666  0.42785  0.88567

Coefficients:
     Estimate Std. Error t value Pr(>|t|)
[1,] -0.03783    0.11623  -0.325    0.747

Residual standard error: 0.6366 on 29 degrees of freedom


Response Y4 :

Call:
lm(formula = Y4 ~ 1)

Residuals:
    Min      1Q  Median      3Q     Max
-2.8513 -0.3955  0.2815  0.5939  1.2475

Coefficients:
     Estimate Std. Error t value Pr(>|t|)
[1,]  -0.1999     0.1620  -1.234    0.227

Residual standard error: 0.8872 on 29 degrees of freedom


Sorry for not being of more help!

Kjetil


-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
               --  Mahdi Elmandjra





-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.




More information about the R-help mailing list