Data Science Asked by so sa on February 14, 2021
I am newbie in speech-to-text AI , but i am trying to find suitable deepspeech setting models for auto extracting the subtitle STR file from the video file, so when i am using the deepspeech-0.6.1 by this setting via this Colab Notebook:
!deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --lm deepspeech-0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio test.wav --extended --json
gave me a better text like this:
and people fishing up in this wall i disarrayed anaveni an derstand
you angle with the amidei and desol make it a relation to me now i was
angry at first to the twin tione can annihilate them how we can tear
down there or hiawatha known by a truly for policing the others and
tautening to no contraction i get it summoned and you can never the
world with the contents one man fight latest him you apaian why we
could have her own world i guess when behind they are creators to die
oh many a unnumbered things out on to donneraile mere the moment were
i to tipple gave them a long amain ere hardey created on a new nothing
pertinacity of monomania he is ugly man and this was his rage
But when i am using this GitHub project :
https://github.com/abhirooptalasila/AutoSub
which is using the deepspeech-0.8.2, by this kind of command:
!python3 autosub/main.py --model /content/AutoSub/deepspeech-0.8.2-models.pbmm --scorer /content/AutoSub/deepspeech-0.8.2-models.scorer --file test.wav
It create this kind of STR file which have too differences with deepspeech-0.6.2 result as show above ( lower quality!):
00:00:00,15 –> 00:00:07,90 and people the tooth allies
2 00:00:09,15 –> 00:00:09,80 this
3 00:00:19,05 –> 00:00:53,65 and isaiah pan i understand your anger
with him and maybe your ride out should exist4 00:00:54,05 –> 00:00:55,25 a red
5 00:00:55,70 –> 00:00:56,10 and
…
So i don’t know what is differences of using the --model /content/AutoSub/deepspeech-0.8.2-models.pbmm --scorer /content/AutoSub/deepspeech-0.8.2-models.scorer
setting instead of --model deepspeech-0.6.1-models/output_graph.pbmm --lm deepspeech-0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie
setting and how i could improve the setting to have better quality subtitle for videos, by using some more welt trained models via others open-source databases like said here:
2.1 Data Collection
We want to use some open-source
datasets, that are available online
or etc methods.
Also asked here:
https://github.com/mozilla/DeepSpeech/issues/3330
Thanks.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP