TransWikia.com

Explainability and Autoencoders

Data Science Asked by Mariah on August 6, 2021

suppose I have an autoencoder as a two-stack LSTM that takes in sequences of $n$ features of some length $m$.

Let’s say that the dimension of my encoding vector is $k$, so the architecture is of the form: $n times m to 1times k to n times m$.

I’m looking into how to construct some explainability metrics on the encoding part of the autoencoder. More specifially, I’d like to know which features impact each of the $k$ encoding entries I have.

Naively, one could vary one feature at a time, check the impact on the encoding and see where it is greatest. This is both computationally expensive and neglects combinations of features.

Do you know of any research or methods that can assist?

One Answer

Feature visualization/inversion might be a good tool here. Basically, instead of performing optimization on the weights, you perform an optimization on the inputs that maximizes each respective entry of your encoding vector. If you want to get fancy, you might try constructing an entire activation atlas.

The interpretml library also has a lot of good tools for model explainability, but they are more geared towards supervised models. Might still be worth checking out.

Answered by David Marx on August 6, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP