TransWikia.com

Should I use keras or sklearn for PCA?

Data Science Asked by shrijit on March 6, 2021

Recentl I saw that there is some basic overlapping of functionality between keras and sklearn regarding data preprocessing.
So I am a bit confused that whether should I introduce a dependency on another library like sklearn for basic data preprocessing or should I stick with only keras as I am using keras for building my models.
I would like to know the difference for scenarios like

  • which is good for production
  • which will give me better and faster response
  • is there any issue with introducing dependency over other libraries for just 1 or 2 functionality
  • which has a better compatibility with other tools like tensorboard or libraries like matplotlib, seaborn etc.

Thanks in advance.

3 Answers

which is good for production

They are both good. sklearn can be used in production as much as tensorflow.keras


which will give me better and faster response

I think that doesn't really depends on the libray, rather on the size of your models and of your datasets. That what really matters. Both modules can be used to create very optimized and fast models.


is there any issue with introducing dependency over other libraries for just 1 or 2 functionality

There are not issues in using sklearn and tensorflow.keras together. In the ML/Data Science world they are probably the two most common tools. No worries about that!


which has a better compatibility with other tools like tensorboard or libraries like matplotlib, seaborn etc.

Well, keras is now a branch of tensorflow (it's tensorflow.keras). The TensorBoard is designed specifically for it. Other than that, all other visualization libraries such as matplotlib and seaborn are perfectly compatible.


Final thoughts:

use sklearn and keras in sequence without problems, Data preprocessing steps can use a lot more libraries. Don't worry of using one more, especially if it's a very solid and popular one such as sklearn.

However, you might want to substitute PCA with Autoencoders. That's arguably the best dimensionality reduction technique, it's non-linear, meaning it can carry more information with less variables, and it can be implemented in tensorflow.keras. In that way you'd have a neural network that generates a compressed representation of your data, and another that makes the prediction. That's just a suggestion of course, you know your task better than anyone else.

Correct answer by Leevo on March 6, 2021

This is difficult to answer without more context of your exact scenario. Typically, though, it's not the best idea to add a large library into a project for just one piece of functionality - especially if it's as simple as PCA. PCA is fairly simple to implement, even with just NumPy, and you will probably be using NumPy if you're using Keras. However, as you progress, if you find yourself needing more and more functionality that's in scikit-learn, then you should probably bring it in.

For production, it's hard to say without more context. There's always pros and cons.

Faster response, it again depends. Will network or disk I/O be your biggest bottleneck or not? Lots of questions

Answered by dtorpey on March 6, 2021

What I would suggest is to build a sklearn pipeline in which one step will be the sklearn PCA and the last step will be your Keras model.

Sklearn pipelines are easy to put into production and can handle a lot more of transformations.

Answered by Carlos Mougan on March 6, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP