[R] R model developing & validating - Open to Discussion

Sat Apr 2 23:40:57 CEST 2016

Throughout my R journey I have noticed the way we can use given data to
develop and validate a model.

Assume that you have given data for a problem

1. train.csv
2. test.csv

*Method A*

*Combine train+test data* and develop a model using the combined data. Then
use test.data to validate the model based on predicted error analysis.

*Method B*

Use *train data* to develop the model and then use *test data* to validate
the model based on predicted error analysis.

*Method C*

Sub divided 75% as training data and 25% test data on *train.csv *file and
use new training data for developing the model. Then use new test data to
validate the model.
After that use initial given test data to double check the performance of
the model.

I have identified 3 methods so it is bit confusing which one to use.

*Are there any other methods other than these methods?*

I need opinions from R experts on

1. What is the best practice?

2. Does that depend on the scale of the problem (smaller data or big data)?

3. a) Confusion matrix is the only way that can we use to check the
performance of a model?

    b) Is there any other matrices to check the performance?

    c) Does it depend on the type of the model(lm(),glm(),tree(),svm()
etc..)?

    d) Do we have different matrices for different models to evaluate the
model?

PS: I have asked this question in stack but no response so I thought to ask
from you guys

Many thanks

	[[alternative HTML version deleted]]