Should I use keras or sklearn for PCA?

Question

Recentl I saw that there is some basic overlapping of functionality between keras and sklearn regarding data preprocessing.
So I am a bit confused that whether should I introduce a dependency on another library like sklearn for basic data preprocessing or should I stick with only keras as I am using keras for building my models.
I would like to know the difference for scenarios like

which is good for production
which will give me better and faster response
is there any issue with introducing dependency over other libraries for just 1 or 2 functionality
which has a better compatibility with other tools like tensorboard or libraries like matplotlib, seaborn etc.

Thanks in advance.

Leevo · Accepted Answer

which is good for production

They are both good. sklearn can be used in production as much as tensorflow.keras

which will give me better and faster response

I think that doesn't really depends on the libray, rather on the size of your models and of your datasets. That what really matters. Both modules can be used to create very optimized and fast models.

is there any issue with introducing dependency over other libraries for just 1 or 2 functionality

There are not issues in using sklearn and tensorflow.keras together. In the ML/Data Science world they are probably the two most common tools. No worries about that!

which has a better compatibility with other tools like tensorboard or libraries like matplotlib, seaborn etc.

Well, keras is now a branch of tensorflow (it's tensorflow.keras). The TensorBoard is designed specifically for it. Other than that, all other visualization libraries such as matplotlib and seaborn are perfectly compatible.

Final thoughts:
use sklearn and keras in sequence without problems, Data preprocessing steps can use a lot more libraries. Don't worry of using one more, especially if it's a very solid and popular one such as sklearn.
However, you might want to substitute PCA with Autoencoders. That's arguably the best dimensionality reduction technique, it's non-linear, meaning it can carry more information with less variables, and it can be implemented in tensorflow.keras. In that way you'd have a neural network that generates a compressed representation of your data, and another that makes the prediction. That's just a suggestion of course, you know your task better than anyone else.

dtorpey · Answer

This is difficult to answer without more context of your exact scenario. Typically, though, it's not the best idea to add a large library into a project for just one piece of functionality - especially if it's as simple as PCA. PCA is fairly simple to implement, even with just NumPy, and you will probably be using NumPy if you're using Keras. However, as you progress, if you find yourself needing more and more functionality that's in scikit-learn, then you should probably bring it in.
For production, it's hard to say without more context. There's always pros and cons.
Faster response, it again depends. Will network or disk I/O be your biggest bottleneck or not? Lots of questions

Carlos Mougan · Answer

What I would suggest is to build a sklearn pipeline in which one step will be the sklearn PCA and the last step will be your Keras model.
Sklearn pipelines are easy to put into production and can handle a lot more of transformations.

Should I use keras or sklearn for PCA?

3 Answers

Add your own answers!

Ask a Question