model tuning by using loss curves

Question

I have been practicing with the following dataset:
http://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength
for building a prediction model based on a MLP, but I have some doubts if the approach followed is the correct. I wanted to tune the activation function based on the following models: identity, logistic, tanh and relu. So what I did is the following:
First I divided my dataset in 80/20/20 for training, validation and test; and for what I know the hyperparameter tuning is in the validation set. So my pseudocode for the validation part is like the following:
for each item in activation function list
    model=MLP(activation=item,solver="adam") with 1000 iterations
    fit(Xtrain,ytrain)
    plot(lossCurve) with the training data
    fit(Xval,yval)
    plot(lossCurve) with the validation data
end for

with this loop I found that the "best" activation function was "relu". I am putting two graphs as an example:

After that I got "adam" and "relu" as hyperparameters and the I tried them with the training and testing set, so roughly I did this:
model=MLP(activation="relu",solver="adam")
fit(Xtrain,ytrain)
plot(lossCurve) with the training data
fit(Xtest,ytest)
plot(lossCurve) with the test data

and the curve I get was the following:

What I wanted to know is if my approach is the correct. I ask this because it is not so easy to find examples of loss curves using scikit. I think because in many Internet tutorials the hyperparameter tuning is made by GridSearch or using CV, and the ones that use loss curves are implemented en Keras or TensorFlow.
I wanted to force my model to obtain a curve like this:

which is an overfitted model and just for the sake of learning. So I was wondering, do all models overfit? or what is happening in my tests? Maybe I made something wrong.
Any help would be greatly appreciated.
Thanks

model tuning by using loss curves

Add your own answers!

Ask a Question