TransWikia.com

How to tune an weighted voting ensemble method?

Data Science Asked on June 5, 2021

I am working on kidney cancer patients’ data with 5 unbalanced labels. These codes are contained of Normalization, Oversampling on Feature Engineering part. A list of 9 ordinary Machine Learning methods is provided which are used for the classification task. Then, I take advantage of two kinds of ensemble methods of hard voting and weighted voting methods. 10-fold CV has is exploited to validate results.

methods = ['Support Vector Machine', 'Logistic Regression', 'K Neighbors Classifier', 'Random Forest', 
          'Gaussian Naive Bayes', 'Linear Discriminant Analysis', 'Decision Tree', 'Gradient Boosting',
          'soft_VotingClassifier','hard_VotingClassifier']

I would like to know how to tune weights for the soft voting method one?
Here, are the code and results I have right now:

from sklearn import preprocessing
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import VotingClassifier

if method == 'soft_VotingClassifier':
    cl1 = LogisticRegression()
    cl2 = KNeighborsClassifier(n_neighbors=10)
    cl3 = RandomForestClassifier(max_depth=35)
    cl4 = GaussianNB()
    cl5 = LinearDiscriminantAnalysis()
    cl6 = DecisionTreeClassifier()
    cl7 = SVC(C=0.1, gamma=0.0001, kernel='poly')
    cl8 = GradientBoostingClassifier()
    estimator = [(method[0],cl1), (method[1],cl2), (method[2],cl3), (method[3],cl4),
                 (method[4],cl5), (method[5],cl6), (method[6],cl7), (method[7],cl8)]
    eclf = VotingClassifier(estimators=estimator,
    voting='soft', weights=[5, 5, 10, 5, 6, 8, 4, 10])

if method == 'hard_VotingClassifier':
    cl1 = LogisticRegression()
    cl2 = KNeighborsClassifier(n_neighbors=10)
    cl3 = RandomForestClassifier(max_depth=35)
    cl4 = GaussianNB()
    cl5 = LinearDiscriminantAnalysis()
    cl6 = DecisionTreeClassifier()
    cl7 = SVC(kernel='linear',gamma='scale')
    cl8 = GradientBoostingClassifier()
    estimator = [(method[0],cl1), (method[1],cl2), (method[2],cl3), (method[3],cl4),
                 (method[4],cl5), (method[5],cl6), (method[6],cl7), (method[7],cl8)]
    eclf = VotingClassifier(estimators=estimator, voting='hard') 

Confusion Matrix results on test data:

[50  5  1  4  2]
[ 0 13  1  0  3]
[ 0  1  2  1  1]
[ 4  0  0  2  0]
[ 0  3  0  0  2]

Accuracy result on test data:

Accuracy result

For those kind people who want to check my code or even run them, I gonna put the repository link.

One Answer

The question is a classical optimization problem.

To make it more simple, lets say that you have 2 models (m1 and m2). And you want to do an ensemble.

First idea, is doing the mean of both

(m1.predict(X) + m2.predict(X))/2

Then you want to optimize the best combination of both models.

for w in [0,0.25,0.5,0.75,1]:
     (w * m1.predict(X) + (1-w) * predict(X) ) / 2 
     print(evaluate results/MSE/Accuracy...)

Your problem is exactly the same as this one but using the voting classifier.

You just need to optimize the array. In a for loop.

weights=[5, 5, 10, 5, 6, 8, 4, 10]

In this case, your array is not normalized and it has more models. So it will take longer. Also voting classifier fits every time the model, so I will take you a long time. Is easier to code it yourself.

Answered by Carlos Mougan on June 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP