[R] FW: logistic regression

Darin Brooks kfmgis at telus.net
Mon Sep 29 03:26:02 CEST 2008


I certainly appreciate your comments, Bert.  It is abundantly clear that I
won't be invited to any of the cocktail parties hosted by the "polite
circles".  I am not a statistician.  I am merely a geographer (in the field
of ecology) trying to develop a predictor to assist in a forestry-based
decision making process.  My work in the natural world has taught me that
NOTHING is predictable ... and the very idea of a bullet-proof ecological
predictive model is doomed to fail.  
That said, there ARE some basic predictors that assist foresters in their
salvage decisions.  They use these on a daily basis.  The problem is that
most of the evidence and modeling is anecdotal.  There really are no models
in the field that I am working in.  And for good reason ... The natural
world isn't interested in being modeled.  I think we can all agree on this -
guru or not.
But even the most basic predictive model (using only the GIS/mappable data
that is readily available to most users) is a starting point.  The resultant
dataset(s) of this potential model will be followed-up and field verified.
Providing this simple starting point (or catalyst if you will)could
potentially save A LOT of time and money.
What I need to do is to isolate the best available variables into a model
and assign a confidence to it.  It doesn't have to change everyone's world
... it just has to change the way of thinking in my small little world.
These past few days have been an education for me in the subject of stepwise
regression.  I approach it with much more apprehension now.  So if nothing
else good comes of this discussion/exercise/experience ... I've learned
something.

Darin Brooks           

-----Original Message-----
From: Bert Gunter [mailto:gunter.berton at gene.com] 
Sent: Sunday, September 28, 2008 6:26 PM
To: 'David Winsemius'; 'Darin Brooks'
Cc: r-help at stat.math.ethz.ch; ted.harding at manchester.ac.uk
Subject: RE: [R] FW: logistic regression


The Inferno awaits me -- but I cannot resist a comment (but DO look at
Frank's website).

There is a deep and disconcerting dissonance here. Scientists are
(naturally) interested in getting at mechanisms, and so want to know which
of the variables "count" and which do not. But statistical analysis --
**any** statistical analysis -- cannot tell you that. All statistical
analysis can do is build models that give good predictions (and only over
the range of the data). The models you get depend **both** on the way Nature
works **and** the peculiarities of your data (which is what Frank referred
to in his comment on data reduction). In fact, it is highly likely that with
your data there are many alternative prediction equations built from
different collections of covariates that perform essentially equally well.
Sometimes it is otherwise, typically when prospective, carefully designed
studies are performed -- there is a reason that the FDA insists on clinical
trials, after all (and reasons why such studies are difficult and expensive
to do!).

The belief that "data mining" (as it is known in the polite circles that
Frank obviously eschews) is an effective (and even automated!) tool for
discovering how Nature works is a misconception, but one that for many
reasons is enthusiastically promoted.  If you are looking only to predict,
it may do; but you are deceived if you hope for Truth. Can you get hints? --
well maybe, maybe not. Chaos beckons.

I think many -- maybe even most -- statisticians rue the day that stepwise
regression was invented and certainly that it has been marketed as a tool
for winnowing out the "important" few variables from the blizzard of
"irrelevant" background noise. Pogo was right: " We have seen the enemy --
and it is us."

(As I said, the Inferno awaits...)

Cheers to all,
Bert Gunter

DEFINITELY MY OWN OPINIONS HERE!



-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of David Winsemius
Sent: Saturday, September 27, 2008 5:34 PM
To: Darin Brooks
Cc: r-help at stat.math.ethz.ch; ted.harding at manchester.ac.uk
Subject: Re: [R] FW: logistic regression

It's more a statement that it expresses a statistical perspective very
succinctly, somewhat like a Zen koan.  Frank's book,"Regression Modeling
Strategies", has entire chapters on reasoned approaches to your question.
His website also has quite a bit of material free for the taking.

--
David Winsemius
Heritage Laboratories

On Sep 27, 2008, at 7:24 PM, Darin Brooks wrote:

> Glad you were amused.
>
> I assume that "booking this as a fortune" means that this was an 
> idiotic way to model the data?
>
> MARS?  Boosted Regression Trees?  Any of these a better choice to 
> extract significant predictors (from a list of about 44) for a 
> measured dependent variable?
>
> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org
> ] On
> Behalf Of Ted Harding
> Sent: Saturday, September 27, 2008 4:30 PM
> To: r-help at stat.math.ethz.ch
> Subject: Re: [R] FW: logistic regression
>
>
>
> On 27-Sep-08 21:45:23, Dieter Menne wrote:
>> Frank E Harrell Jr <f.harrell <at> vanderbilt.edu> writes:
>>
>>> Estimates from this model (and especially standard errors and
>>> P-values)
>>> will be invalid because they do not take into account the stepwise 
>>> procedure above that was used to torture the data until they 
>>> confessed.
>>>
>>> Frank
>>
>> Please book this as a fortune.
>>
>> Dieter
>
> Seconded!
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 27-Sep-08                                       Time: 23:30:19
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com
>
> 6:55 PM
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

No virus found in this incoming message.
Checked by AVG - http://www.avg.com

1:11 PM



More information about the R-help mailing list