How could a neural network classifer for multilclass problem classify only in one class when a decision tree is more balanced and accurate?

Question

I want to create a classifier for a data frame that has four classes. Each line can only have one class. I have two predictive models: a neural network and a tree classifier. But they put everyone in one class, during training and therefore during testing.

Neural network clasifies only in one class

The problem is that the classification from my neural network is:

I call the model here:

from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.models import load_model

model = create_model(x_train.shape[1], y_train.shape[1])
epochs =  30
batch_sz = 64

print("Beginning model training with batch size {} and {} epochs".format(batch_sz, epochs))

checkpoint = ModelCheckpoint("lc_model.h5", monitor='val_acc', verbose=0, save_best_only=True, mode='auto', period=1)

# train the model
history = model.fit(x_train.to_numpy(),
                y_train.to_numpy(),
                validation_split=0.2,
                epochs=epochs,  
                batch_size=batch_sz, 
                verbose=2,
#                 class_weight = weights, # class_weight tells the model to "pay more attention" to samples from an under-represented grade class.
#                 callbacks=[checkpoint]
                   )

# revert to the best model encountered during training
model = load_model("lc_model.h5", compile=False)

Here is the architecture of the model:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.constraints import MaxNorm
# from tensorflow.python.compiler.tensorrt import trt_convert as trt

def create_model(input_dim, output_dim):
    print(output_dim)
    # create model
    model = Sequential()
    print("sequential")
    # input layer
    model.add(Dense(100, input_dim=input_dim, activation='relu', kernel_constraint=MaxNorm(3)))
    model.add(Dropout(0.2))

# hidden layer
    model.add(Dense(60, activation='relu', kernel_constraint=MaxNorm(3)))
    model.add(Dropout(0.2))

# output layer
    model.add(Dense(output_dim, activation='softmax'))

# Compile model
    model.compile(loss=focal_loss(alpha=1), loss_weights=None, optimizer='nadam', metrics=['accuracy'])

return model

Here is a part of x_train.

id reg  0.0_x   1.0_x   17.0    21.0    30.0    40.0    50.0    60.0    70.0    Célibataire     Divorcé(e)  Marié et j'ai des enfants à charge  Marié et je n'ai pas encore d'enfants à charge  Refus de répondre   Veuf (ve)   1er cycle universitaire / Licence   2e cycle universtaire / Master  3e cycle universtaire / Doctorat    BTS     Je n'ai jamais été à l'école    Niveau collège  Niveau lycée    Niveau primaire     Autre. Merci de préciser :@NS$  Infirme     J'ai une société    Je ne travaille pas     Je suis commerçant  Je suis encore étudiant     Je suis independent     Je suis journalier, je travaille de temps à autre   Je suis retraité    Je travaille dans le secteur privé  Je travaille dans le secteur public     0.0_y   250.0   3750.0  7500.0  8750.0  11250.0     11500.0     18750.0     25000.0     35000.0     45000.0     50000.0     0.0_x.1     1.0_y   0.0_y.1     1.0_x.1     Je ne suis pas d'accord     Je suis d'accord    False_x     True_y  False_y     True_x  False_x.1   True_y.1    False_y.1   True_x.1    False_x.2   True_y.2    False_y.2   True_x.2    False_x.3   True_y.3    False_y.3   True_x.3    False_x.4   True_y.4    False_y.4   True_x.4    False_x.5   True_y.5    False_y.5   True_x.5    0.0_x.2     1.0_y.1     0.0_y.2     1.0_x.2     0.0_x.3     1.0_y.2     0.0_y.3     1.0
0   NaN     0   1   0   0   1   0   0   0   0   1   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   1   0   1   0   0   1   0   1   0   1   1   0   1   0   0   1   1   0   1   0   1   0   1   0   1   0   1   0   1   0   0   1   1   0   0   1   0   1
1   NaN     0   1   0   0   1   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   1   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   1   1   0   1   0   0   1   1   0   1   0   1   0   0   1   1   0   1   0   1   0   1   0   0   1   1   0   1   0   1   0   1   0   1   0   0   0
...

And here is part of y_train:

Voting intention in 2021_Cast a blank vote   Voting intention in 2021_I know who I will be voting for in 2021    Voting intention in 2021_I won't vote   Voting intention in 2021_I'm going to vote in 2021 but don't know for who
0                0                                  1                                                                       0                                                                       0   
1                0                                  0                                                                       0                                                                       1
...

So when I try to test this model it's not better than random:

sequential
Beginning model training with batch size 64 and 30 epochs
WARNING:tensorflow:`period` argument is deprecated. Please use `save_freq` to specify the frequency in number of samples seen.
Train on 768 samples, validate on 192 samples
Epoch 1/30
768/768 - 1s - loss: -inf - acc: 0.2448 - val_loss: -inf - val_acc: 0.2708
Epoch 2/30
768/768 - 0s - loss: -inf - acc: 0.2409 - val_loss: -inf - val_acc: 0.2708
...
Epoch 30/30
768/768 - 0s - loss: -inf - acc: 0.2409 - val_loss: -inf - val_acc: 0.2708

Indeed, the accuracy is just below 25%, which is a result I would have expected from random selecting the classes. And it seems to never learn anything as the loss is always -inf.

So I calculate the accuracy of the model on the test set and it is even worse. Indeed with the following code:

import numpy as np
from sklearn.metrics import f1_score

y_pred = model.predict(x_test.to_numpy())

# Revert one-hot encoding to classes
y_pred_classes = pd.DataFrame((y_pred.argmax(1)[:,None] == np.arange(y_pred.shape[1])),
                              columns=y_test.columns,
                              index=y_test.index)

y_test_vals = y_test.idxmax(1)
y_pred_vals = y_pred_classes.idxmax(1)

# F1 score
# Use idxmax() to convert back from one-hot encoding
f1 = f1_score(y_test_vals, y_pred_vals, average='weighted')
print("Test Set Accuracy: {:.2%}   (But results would have been better if trained on the FULL dataset)".format(f1))

I don't understand, it is an architecture that I had managed to put to work on another loan classification problem.

I get: Test Set Accuracy: 10.92%

With weights:

All the preceding modelisation were unweighted or without focal loss.

I tried to cope with the class unbalance in different way, such as resampling. Without resampling I did it with weights:

weights = df_en2['Voting intention in 2021'].value_counts(normalize=True)
weights = weights.sort_index().tolist()
weights = {0: 1 / weights[0],
           1: 1 / weights[1], 
           2: 1 / weights[2],
           3: 1 / weights[3]}

Where dfen_2 is the dataframe that gives x_train, y_train, x_test, y_test with spilt_data() function which you can find here (It's basically the same architecture but for the loan classification problem).

Multiclass tree classifier

In comparison, with a tree classifier, if I leave max_depth to None, the leaves are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. And I get an average accuracy of 42%.

# importing necessary libraries 
from sklearn import datasets 
from sklearn.metrics import confusion_matrix 
from sklearn.model_selection import train_test_split

# dividing X, y into train and test data 
# X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)

# training a DescisionTreeClassifier 
from sklearn.tree import DecisionTreeClassifier 
dtree_model = DecisionTreeClassifier().fit(x_train, y_train) 
dtree_predictions = dtree_model.predict(x_test)

# creating a confusion matrix 
cm = confusion_matrix(y_test.values.argmax(axis = 1), dtree_predictions.argmax(axis = 1))

And it returns:

array([[0.19047619, 0.15873016, 0.45454545, 0.27118644],
       [0.15873016, 0.38095238, 0.2       , 0.30508475],
       [0.15873016, 0.15873016, 0.4       , 0.22033898],
       [0.19047619, 0.19047619, 0.21818182, 0.38983051]])

LiamFiddler · Answer

With multi-label classification you want to use binary_crossentropy loss and sigmoid activation on the final layer since each class can take a value between 0 and 1. In your current case, your model is assuming that the sum of all values from the last layer should total 1, when in reality the total could be more than 1. Here is a good reference: https://towardsdatascience.com/multi-label-image-classification-with-neural-network-keras-ddc1ab1afede

Answered by LiamFiddler on February 12, 2021

How could a neural network classifer for multilclass problem classify only in one class when a decision tree is more balanced and accurate?

Neural network clasifies only in one class

With weights:

Multiclass tree classifier

One Answer

Add your own answers!

Ask a Question