Data Science Asked on June 5, 2021
I am working on kidney cancer patients’ data with 5 unbalanced labels. These codes are contained of Normalization, Oversampling on Feature Engineering part. A list of 9 ordinary Machine Learning methods is provided which are used for the classification task. Then, I take advantage of two kinds of ensemble methods of hard voting and weighted voting methods. 10-fold CV has is exploited to validate results.
methods = ['Support Vector Machine', 'Logistic Regression', 'K Neighbors Classifier', 'Random Forest',
'Gaussian Naive Bayes', 'Linear Discriminant Analysis', 'Decision Tree', 'Gradient Boosting',
'soft_VotingClassifier','hard_VotingClassifier']
I would like to know how to tune weights for the soft voting method one?
Here, are the code and results I have right now:
from sklearn import preprocessing
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import VotingClassifier
if method == 'soft_VotingClassifier':
cl1 = LogisticRegression()
cl2 = KNeighborsClassifier(n_neighbors=10)
cl3 = RandomForestClassifier(max_depth=35)
cl4 = GaussianNB()
cl5 = LinearDiscriminantAnalysis()
cl6 = DecisionTreeClassifier()
cl7 = SVC(C=0.1, gamma=0.0001, kernel='poly')
cl8 = GradientBoostingClassifier()
estimator = [(method[0],cl1), (method[1],cl2), (method[2],cl3), (method[3],cl4),
(method[4],cl5), (method[5],cl6), (method[6],cl7), (method[7],cl8)]
eclf = VotingClassifier(estimators=estimator,
voting='soft', weights=[5, 5, 10, 5, 6, 8, 4, 10])
if method == 'hard_VotingClassifier':
cl1 = LogisticRegression()
cl2 = KNeighborsClassifier(n_neighbors=10)
cl3 = RandomForestClassifier(max_depth=35)
cl4 = GaussianNB()
cl5 = LinearDiscriminantAnalysis()
cl6 = DecisionTreeClassifier()
cl7 = SVC(kernel='linear',gamma='scale')
cl8 = GradientBoostingClassifier()
estimator = [(method[0],cl1), (method[1],cl2), (method[2],cl3), (method[3],cl4),
(method[4],cl5), (method[5],cl6), (method[6],cl7), (method[7],cl8)]
eclf = VotingClassifier(estimators=estimator, voting='hard')
Confusion Matrix results on test data:
[50 5 1 4 2]
[ 0 13 1 0 3]
[ 0 1 2 1 1]
[ 4 0 0 2 0]
[ 0 3 0 0 2]
Accuracy result on test data:
For those kind people who want to check my code or even run them, I gonna put the repository link.
The question is a classical optimization problem.
To make it more simple, lets say that you have 2 models (m1 and m2). And you want to do an ensemble.
First idea, is doing the mean of both
(m1.predict(X) + m2.predict(X))/2
Then you want to optimize the best combination of both models.
for w in [0,0.25,0.5,0.75,1]:
(w * m1.predict(X) + (1-w) * predict(X) ) / 2
print(evaluate results/MSE/Accuracy...)
Your problem is exactly the same as this one but using the voting classifier.
You just need to optimize the array. In a for loop.
weights=[5, 5, 10, 5, 6, 8, 4, 10]
In this case, your array is not normalized and it has more models. So it will take longer. Also voting classifier fits every time the model, so I will take you a long time. Is easier to code it yourself.
Answered by Carlos Mougan on June 5, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP