[R] R model developing & validating - Open to Discussion

Bert Gunter bgunter.4567 at gmail.com
Sun Apr 3 19:24:32 CEST 2016


This is way OT for this list, and really has nothing to do with R.
Post on a statistical list like stats.stackexchange.com if you want to
repeat a discussion that has gone on for decades and has no
resolution.

You really should be spending time with the literature, though. Have
you? "Cross validation" and "penalized regression" might be a couple
of terms to start you off, although they are far from sufficient, and
others might suggest better ones.

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Apr 2, 2016 at 2:40 PM, Norman Polozka <normanmath at gmail.com> wrote:
> Throughout my R journey I have noticed the way we can use given data to
> develop and validate a model.
>
> Assume that you have given data for a problem
>
> 1. train.csv
> 2. test.csv
>
> *Method A*
>
> *Combine train+test data* and develop a model using the combined data. Then
> use test.data to validate the model based on predicted error analysis.
>
> *Method B*
>
> Use *train data* to develop the model and then use *test data* to validate
> the model based on predicted error analysis.
>
> *Method C*
>
> Sub divided 75% as training data and 25% test data on *train.csv *file and
> use new training data for developing the model. Then use new test data to
> validate the model.
> After that use initial given test data to double check the performance of
> the model.
>
> I have identified 3 methods so it is bit confusing which one to use.
>
> *Are there any other methods other than these methods?*
>
> I need opinions from R experts on
>
> 1. What is the best practice?
>
> 2. Does that depend on the scale of the problem (smaller data or big data)?
>
> 3. a) Confusion matrix is the only way that can we use to check the
> performance of a model?
>
>     b) Is there any other matrices to check the performance?
>
>     c) Does it depend on the type of the model(lm(),glm(),tree(),svm()
> etc..)?
>
>     d) Do we have different matrices for different models to evaluate the
> model?
>
>
> PS: I have asked this question in stack but no response so I thought to ask
> from you guys
>
> Many thanks
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list