TransWikia.com

About proper Training Model and settings for building Subtitle by using Deepspeech GitHub Project

Data Science Asked by so sa on February 14, 2021

I am newbie in speech-to-text AI , but i am trying to find suitable deepspeech setting models for auto extracting the subtitle STR file from the video file, so when i am using the deepspeech-0.6.1 by this setting via this Colab Notebook:

!deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --lm deepspeech-0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio test.wav   --extended --json  

gave me a better text like this:

and people fishing up in this wall i disarrayed anaveni an derstand
you angle with the amidei and desol make it a relation to me now i was
angry at first to the twin tione can annihilate them how we can tear
down there or hiawatha known by a truly for policing the others and
tautening to no contraction i get it summoned and you can never the
world with the contents one man fight latest him you apaian why we
could have her own world i guess when behind they are creators to die
oh many a unnumbered things out on to donneraile mere the moment were
i to tipple gave them a long amain ere hardey created on a new nothing
pertinacity of monomania he is ugly man and this was his rage

But when i am using this GitHub project :

https://github.com/abhirooptalasila/AutoSub

which is using the deepspeech-0.8.2, by this kind of command:

!python3 autosub/main.py  --model /content/AutoSub/deepspeech-0.8.2-models.pbmm --scorer /content/AutoSub/deepspeech-0.8.2-models.scorer --file test.wav

It create this kind of STR file which have too differences with deepspeech-0.6.2 result as show above ( lower quality!):

00:00:00,15 –> 00:00:07,90 and people the tooth allies

2 00:00:09,15 –> 00:00:09,80 this

3 00:00:19,05 –> 00:00:53,65 and isaiah pan i understand your anger
with him and maybe your ride out should exist

4 00:00:54,05 –> 00:00:55,25 a red

5 00:00:55,70 –> 00:00:56,10 and

So i don’t know what is differences of using the --model /content/AutoSub/deepspeech-0.8.2-models.pbmm --scorer /content/AutoSub/deepspeech-0.8.2-models.scorer setting instead of --model deepspeech-0.6.1-models/output_graph.pbmm --lm deepspeech-0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie setting and how i could improve the setting to have better quality subtitle for videos, by using some more welt trained models via others open-source databases like said here:

2.1 Data Collection

We want to use some open-source
datasets, that are available online

  • National Speech Corpus: Contains 2000 hours of locally accented audio and text transcriptions
  • LibriSpeech: Dataset consists of a large-scale corpus of around 1000 hours of English speech
  • TIMIT: A collection of recordings of 630 speakers of American English
  • L2-ARCTIC: A non-native English speech corpus
  • or etc methods.

    Also asked here:

    https://github.com/mozilla/DeepSpeech/issues/3330

    Thanks.

    Add your own answers!

    Ask a Question

    Get help from others!

    © 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP