Data Science Asked by shrijit on March 6, 2021
Recentl I saw that there is some basic overlapping of functionality between keras and sklearn regarding data preprocessing.
So I am a bit confused that whether should I introduce a dependency on another library like sklearn for basic data preprocessing or should I stick with only keras as I am using keras for building my models.
I would like to know the difference for scenarios like
Thanks in advance.
which is good for production
They are both good. sklearn can be used in production as much as tensorflow.keras
which will give me better and faster response
I think that doesn't really depends on the libray, rather on the size of your models and of your datasets. That what really matters. Both modules can be used to create very optimized and fast models.
is there any issue with introducing dependency over other libraries for just 1 or 2 functionality
There are not issues in using sklearn
and tensorflow.keras
together. In the ML/Data Science world they are probably the two most common tools. No worries about that!
which has a better compatibility with other tools like tensorboard or libraries like matplotlib, seaborn etc.
Well, keras
is now a branch of tensorflow
(it's tensorflow.keras
). The TensorBoard is designed specifically for it. Other than that, all other visualization libraries such as matplotlib
and seaborn
are perfectly compatible.
Final thoughts:
use sklearn
and keras
in sequence without problems, Data preprocessing steps can use a lot more libraries. Don't worry of using one more, especially if it's a very solid and popular one such as sklearn
.
However, you might want to substitute PCA with Autoencoders. That's arguably the best dimensionality reduction technique, it's non-linear, meaning it can carry more information with less variables, and it can be implemented in tensorflow.keras
. In that way you'd have a neural network that generates a compressed representation of your data, and another that makes the prediction. That's just a suggestion of course, you know your task better than anyone else.
Correct answer by Leevo on March 6, 2021
This is difficult to answer without more context of your exact scenario. Typically, though, it's not the best idea to add a large library into a project for just one piece of functionality - especially if it's as simple as PCA. PCA is fairly simple to implement, even with just NumPy, and you will probably be using NumPy if you're using Keras. However, as you progress, if you find yourself needing more and more functionality that's in scikit-learn, then you should probably bring it in.
For production, it's hard to say without more context. There's always pros and cons.
Faster response, it again depends. Will network or disk I/O be your biggest bottleneck or not? Lots of questions
Answered by dtorpey on March 6, 2021
What I would suggest is to build a sklearn pipeline in which one step will be the sklearn PCA and the last step will be your Keras model.
Sklearn pipelines are easy to put into production and can handle a lot more of transformations.
Answered by Carlos Mougan on March 6, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP