TransWikia.com

Can transformer be better than RNN for online speech recognition?

Artificial Intelligence Asked on February 23, 2021

Does transformer have the potential to replace RNN end-to-end models for speech recognition for online speech recognition? This mainly depends on accuracy/latency and deploy cost, not training cost. Can transformer support low latency online use case and have comparable deploy cost and better result than RNN models?

One Answer

Are there examples that transformer have better accuracy than RNN end-to-end model like RNN-transducer for speech recognition? Can transformer be used for online speech recognition which require low speech-end-to-result latency? Does transformer have the potential to replace RNN end-to-end models for speech recognition in most cases in the future? This may mainly depends on accuracy and deploy cost, not training cost.

You can check facebook results on wav2letter on all this:

https://ai.facebook.com/blog/online-speech-recognition-with-wav2letteranywhere/

https://research.fb.com/publications/scaling-up-online-speech-recognition-using-convnets/

Transformers definitely have a potential in speech especially when combined with faster computatoin methods (hashing) just like in NLP.

The problem with transformers is that you need a lot of GPUs to train them.

Answered by Nikolay Shmyrev on February 23, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP