[R] R: Optimization

Tue Sep 5 16:10:34 CEST 2006

Thanks for the help.
I thought about taking the logarithms of the expression but to cope with the
problem of zero values both in yield and the values of the environmental
variables another transformation is needed (I simply add one to the yield
value and to the environmental variables values). I follow the suggestion
kindly provided by Spencer Graves but I found the following problems, which
depend probably on my understanding:
1) I don't understand why it is needed to test the no-constant model (which
provides a better fitting of the model). Can I simply assume a no-constant
model? 
2) Here comes the big problem. I don't understand what it is done with the
models fit.0 and fit.1. In both cases I have to leave out one variable from
the linear model but I don't understand why in this way I would test the
constraint that all the coefficients should sum to 1. And it is not possible
to apply anova(fit0,fit.0) and anova(fit1,fit.1) because I have different
response variables. 

In conclusion, I don't actually know how to proceed and if in R this kind of
analysis is possible.
Any help and any further explanation would be very appreciated. Below I
report part of the data frame I'm using for the analysis. The environmental
variables (Salinity, Hydrodynamism, Sediment, Oxygen, Chlorophyll and
Bathymetry) values have been transformed using suitability function (and
thus are bounded between 0 and 1) while Yield is in kg/m2.

   Sedi Sal Bathy Chl Hydro Oxy  Yield
1  1.00 1.00 0.51   1 0.17 0.75 1.5
2  0.50 0.95 1.00   1 0.09 0.94 0.4
3  0.50 1.00 0.17   1 0.44 0.90 1.8
4  1.00 0.98 1.00   1 0.10 0.89 4.5
5  0.13 0.84 0.73   1 0.16 0.84 0.4
6  0.50 0.90 0.91   1 0.22 0.84 0.4
7  0.13 0.75 1.00   1 0.14 0.86 0.2
8  0.13 0.84 0.75   1 0.10 0.83 0.3
9  0.13 0.78 0.97   1 0.06 0.84 0.5
10 0.13 0.87 0.70   1 0.45 0.85 1.0
11 1.00 0.77 1.00   1 0.19 0.86 1.5
12 1.00 0.94 0.81   1 0.47 0.86 3.0
13 1.00 0.93 1.00   1 0.45 0.89 2.5
14 0.50 1.00 1.00   1 0.54 0.84 4.0
15 0.50 1.00 1.00   1 0.25 0.88 2.2
16 1.00 1.00 0.56   1 0.25 0.90 5.0
17 1.00 0.90 0.56   1 0.40 0.90 1.5
18 0.50 0.97 1.00   1 0.22 0.95 1.0
19 0.54 0.96 1.00   1 0.18 0.91 0.3
20 1.00 0.97 0.33   1 0.39 0.90 3.0

And here the results of the fitted models so far.

summary(fit0)

Call:
lm(formula = log(Yield + 1) ~ log(Sal + 1) + log(Bathy + 1) + log(Chl + 
    1) + log(Hydro + 1) + log(Oxy + 1) + log(Sedi + 1) - 1, data = data.df)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.05485 -0.23759  0.01331  0.18692  1.23803 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
log(Sal + 1)   5.07193    1.00584   5.042 1.98e-06 ***
log(Bathy + 1)  0.05804    0.20684   0.281 0.779561    
log(Chl + 1)  -8.53941    2.11720  -4.033 0.000106 ***
log(Hydro + 1)  1.73835    0.28815   6.033 2.56e-08 ***
log(Oxy + 1)   4.19951    1.98459   2.116 0.036750 *  
log(Sedi + 1)  1.15953    0.14807   7.831 4.51e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 0.3735 on 103 degrees of freedom
Multiple R-Squared: 0.9148,     Adjusted R-squared: 0.9098 
F-statistic: 184.2 on 6 and 103 DF,  p-value: < 2.2e-16 

> summary(fit1)

Call:
lm(formula = log(Yield + 1) ~ log(Sal + 1) + log(Bathy + 1) + log(Chl + 
    1) + log(Hydro + 1) + log(Oxy + 1) + log(Sedi + 1), data = (data.df))

Residuals:
      Min        1Q    Median        3Q       Max 
-1.057766 -0.227720  0.006146  0.192200  1.222543 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -4.49400    4.76603  -0.943   0.3479    
log(Sal + 1)   5.13499    1.00861   5.091 1.63e-06 ***
log(Bathy + 1)  0.06685    0.20716   0.323   0.7476    
log(Chl + 1)  -2.48909    6.75718  -0.368   0.7134    
log(Hydro + 1)  1.70674    0.29025   5.880 5.22e-08 ***
log(Oxy + 1)   4.63758    2.03928   2.274   0.0251 *  
log(Sedi + 1)  1.14005    0.14959   7.621 1.34e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 0.3737 on 102 degrees of freedom
Multiple R-Squared: 0.7589,     Adjusted R-squared: 0.7447 
F-statistic: 53.51 on 6 and 102 DF,  p-value: < 2.2e-16 

> summary(fit.0)

Call:
lm(formula = log((Yield + 1)/(Sedi + 1)) ~ log((Sal + 1)/(Sedi + 
    1)) + log((Bathy + 1)/(Sedi + 1)) + log((Chl + 1)/(Sedi + 
    1)) + log((Hydro + 1)/(Sedi + 1)) + log((Oxy + 1)/(Sedi + 
    1)) - 1, data = subset(data.df))

Residuals:
    Min      1Q  Median      3Q     Max 
-1.5762 -0.2591  0.1065  0.5023  1.4533 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)    
log((Sal + 1)/(Sedi + 1))    3.0170     1.5304   1.971  0.05134 .  
log((Bathy + 1)/(Sedi + 1))  -0.3982     0.3139  -1.268  0.20748    
log((Chl + 1)/(Sedi + 1))    8.8137     2.3941   3.681  0.00037 ***
log((Hydro + 1)/(Sedi + 1))   0.3309     0.4066   0.814  0.41760    
log((Oxy + 1)/(Sedi + 1))  -12.5242     2.1881  -5.724 1.01e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 0.5766 on 104 degrees of freedom
Multiple R-Squared: 0.523,      Adjusted R-squared: 0.5001 
F-statistic:  22.8 on 5 and 104 DF,  p-value: 2.179e-15 

> summary(fit.1)

Call:
lm(formula = log((Yield + 1)/(Sedi + 1)) ~ log((Sal + 1)/(Sedi + 
    1)) + log((Bathy + 1)/(Sedi + 1)) + log((Chl + 1)/(Sedi + 
    1)) + log((Hydro + 1)/(Sedi + 1)) + log((Oxy + 1)/(Sedi + 
    1)), data = subset(data.df))

Residuals:
     Min       1Q   Median       3Q      Max 
-1.05552 -0.23441  0.01263  0.18355  1.24473 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)                  1.84916    0.15474  11.950  < 2e-16 ***
log((Sal + 1)/(Sedi + 1))    5.03863    1.00977   4.990 2.46e-06 ***
log((Bathy + 1)/(Sedi + 1))   0.05279    0.20767   0.254   0.7999    
log((Chl + 1)/(Sedi + 1))  -10.96682    2.27270  -4.825 4.86e-06 ***
log((Hydro + 1)/(Sedi + 1))   1.74631    0.28980   6.026 2.64e-08 ***
log((Oxy + 1)/(Sedi + 1))    3.95938    1.98206   1.998   0.0484 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 0.3751 on 103 degrees of freedom
Multiple R-Squared: 0.5606,     Adjusted R-squared: 0.5393 
F-statistic: 26.28 on 5 and 103 DF,  p-value: < 2.2e-16

> anova(fit1,fit0)
Analysis of Variance Table

  Res.Df     RSS  Df Sum of Sq      F Pr(>F)
1    102 14.2409                            
2    103 14.3650  -1   -0.1241 0.8891 0.3479

Thanks for the help

Simone Vincenzi

_________________________________________
Simone Vincenzi, PhD Student 
Department of Environmental Sciences
University of Parma
Parco Area delle Scienze, 33/A, 43100 Parma, Italy
Phone: +39 0521 905696
Fax: +39 0521 906611
e.mail: svincenz at nemo.unipr.it 

-----Messaggio originale-----
Da: Spencer Graves [mailto:spencer.graves at pdf.com] 
Inviato: lunedì 4 settembre 2006 21.31
A: Simone Vincenzi
Cc: r-help at stat.math.ethz.ch
Oggetto: Re: [R] Optimization

      Have you considered talking logarithms of the expression you 
mentioned: 

      log(Yield) = a1*log(A)+b1*log(B)+c2*log(C)+...

where a1 = a/(a+b+...), etc.  This model has two constraints not present 
in ordinary least squares:  First, the intercept is assumed to be zero.  
Second, the coefficients in this log formulation must sum to 1.  If I 
were you, I might use something like "lm" to test them both. 

      To explain how, I'll modify the notation, replacing A by X1, B by 
X2, ..., up to Xkm1 (= X[k-1]) and Xk for k different environmental 
variables.  Then I might try something like the following: 

      fit0 <- lm(log(Yield) ~ log(X1) + ... + log(Xk)-1 )
      fit1 <- lm(log(Yield) ~ log(X1) + ... + log(Xk) )
      fit.1 <- lm(log(Yield/Xk) ~ log(X1/Xk) + ... + log(Xkm1/Xk) )
      fit.0 <- lm(log(Yield/Xk) ~ log(X1/Xk) + ... + log(Xkm1/Xk)-1 )

      anova(fit1, fit0) would test the no-constant model, and if I 
haven't made a mistake in this, anova(fit0, fit.0) and anova(fit1, 
fit.1) would test the constraint that all the coefficients should sum to 
1. 

      If you would like further help from this listserve, please provide 
commented, minimal, self-contained, reproducible code to help potential 
respondents understand your question and concerns (as suggested in the 
posting guide "www.R-project.org/posting-guide.html"). 

      Hope this helps. 
      Spencer Graves

Simone Vincenzi wrote:
> Dear R-list,
> I'm trying to estimate the relative importance of 6 environmental
variables
> in determining clam yield. To estimate clam yield a previous work used the
> function Yield = (A^a*B^b*C^c...)^1/(a+b+c+...) where A,B,C... are the
> values of the environmental variables and the weights a,b,c... have not
been
> calibrated on data but taken from literature. Now I'd like to estimate the
> weights a,b,c... by using a dataset with 110 observations of yield and
> values of the environmental variables. I'm wondering if it is feasible or
if
> the number of observation is too low, if some data transformation is
needed
> and which R function is the most appropriate to try to estimate the
weights.
> Any help would be greatly appreciated.
>
> Simone Vincenzi 
>
> _________________________________________
> Simone Vincenzi, PhD Student 
> Department of Environmental Sciences
> University of Parma
> Parco Area delle Scienze, 33/A, 43100 Parma, Italy
> Phone: +39 0521 905696
> Fax: +39 0521 906611
> e.mail: svincenz at nemo.unipr.it 
>
>
>
> --
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   

-- 
No virus found in this incoming message.

--