[R] 'all subsets' fitting algorithm for Bayesian approach

Bert Gunter gunter.berton at gene.com
Fri Oct 1 19:32:59 CEST 2010


ummmm... You are reinventing the wheel. In fact, several wheels: the
statistical literature already has several different approaches worked
out for this. For example, George Box and David Steinberg did one
about 20 years ago, and it has been incorporated as one of the options
in the JMP DOE model choice procedure.

So do your homework and save yourself some effort. Maybe even all your effort.

-- Bert

On Fri, Oct 1, 2010 at 7:02 AM, Michael Hopkins
<hopkins at upstreamsystems.com> wrote:
>
> Hi R experts
>
> I am just wondering if something is already available (or easily adaptable) to do the following.
>
> I am planning to build linear models for all possible combinations of terms, so for example if the terms are sent into a function as this string
>
> " X1 + X2 + X3 + X4 + X1:X2"
>
> I would want to build models for all possible combinations of these 5 terms, e.g.
>
>        m1 <- lm( y ~ X1 + X3 )
>
> and capture at least the residual sum of squares and total number of model parameters from each model produced.  This will become part of a Bayesian approach to infer actual model probabilities when specialist prior knowledge is also introduced into the problem.
>
> At a high level this particular problem requires something like:
>
> 1) the term 'string' to be broken down into it's elements which are separated by "+" and, I suppose, stored in a list for easier manipulation
>
> 2) a matrix with 2^5 rows and 5 columns to be formed with a 0 present if the term is not included and 1 if it is.  Then a model will be fitted to represent every row of this matrix and the key statistics stored in vectors of length 2^5
>
> For N terms of course the number of models will be 2^N.
>
> Is there anything available already?  This is a very similar problem to all subsets regression.
>
> My skill at manipulating strings in R is very limited; can anyone recommend some links or available functions which would make the separations and constructions required easy to achieve?
>
> Thanks in advance to all
>
>
> Michael Hopkins
> Algorithm and Statistical Modelling Expert
>
> Upstream
> 23 Old Bond Street
> London
> W1S 4PZ
>
> Mob +44 0782 578 7220
> DL   +44 0207 290 1326
> Fax  +44 0207 290 1321
>
> hopkins at upstreamsystems.com
> www.upstreamsystems.com
>
> IMPORTANT NOTICE
> The information in this e-mail and any attached files is...{{dropped:22}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://devo.gene.com/groups/devo/depts/ncb/home.shtml



More information about the R-help mailing list