Data Science Asked by Mutatos on December 14, 2020
I have different results for the same kernel on python 2.7 (local machine) and python3 (the system running on kaggle) for LogisticRegression. How it is possible?
Here my results from my local machine:
Here from the kaggle notebook:
The amount of data is different, because I split a bit more for the train data, but the predictions are totally different. Can this be because of the versions of python?
Assuming that you are using sklearn, if you go to the source in the github you will see that they do not differ.
But you have variability elsewhere. For example train_test_split of sklearn also takes random seed, so it could be that your data is fundamentally different between these approaches (you could do cross valdiation for example to include all the data in train and test)
Answered by Noah Weber on December 14, 2020
The main problem was the ordinal data transformation. Python 2.7 somehow has detected the ordinal data and works with it. In python3 I have to convert the ordinal data with:
from sklearn.preprocessing import OrdinalEncoder
encoder = OrdinalEncoder()
train_df.F1 = encoder.fit_transform(train_df.F1.values.reshape(-1, 1))
Now it works with python3 also!
Answered by Mutatos on December 14, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP