Tech Talk: Ruminating on TTS and ASR (STT)

Advances in deep neural networks have helped us build far more accurate voice bots. While adding a voice channel to a chatbot, there are two important services that are required:

1) TTS (text to Speech)

2) ASR/STT (Automatic Speech Recognition or Speech to Text)

For TTS, there are a lot of commerical cloud services available ---- few examples given below:

Similary for speech recognition, there are many commercial services available:

If you are looking for open source solutions for TTS & STT, then the following open source projects look promising:

MaryTTS: http://mary.dfki.de/
Kaldi Speech recognition - https://kaldi-asr.org/doc/
SoX sound processing - http://sox.sourceforge.net/
Open Source Wrapper API - https://github.com/codeforequity-at/botium-speech-processing
NVIDIA TTS - https://github.com/NVIDIA/tacotron2
NVIDIA Speech Synthesis - https://github.com/NVIDIA/waveglow
Mozilla TTS - https://github.com/mozilla/TTS
Mozilla Speech to Text - https://github.com/mozilla/DeepSpeech
Facebook STT - https://github.com/flashlight/wav2letter

A few articles that explain the setup:

Tech Talk

Tuesday, December 14, 2021

Ruminating on TTS and ASR (STT)

No comments:

Post a Comment

Search This Blog

Total Pageviews

Categories

Blog Archive