Advances in deep neural networks have helped us build far more accurate voice bots. While adding a voice channel to a chatbot, there are two important services that are required:
1) TTS (text to Speech)
2) ASR/STT (Automatic Speech Recognition or Speech to Text)
For TTS, there are a lot of commerical cloud services available ---- few examples given below:
- https://aws.amazon.com/polly/
- https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/#overview
- https://cloud.google.com/text-to-speech
- https://www.nuance.com/en-gb/omni-channel-customer-engagement/voice-and-ivr/text-to-speech.html
Similary for speech recognition, there are many commercial services available:
- https://aws.amazon.com/transcribe/
- https://azure.microsoft.com/en-us/services/cognitive-services/speech-to-text/
- https://cloud.google.com/speech-to-text
- https://www.nuance.com/en-gb/dragon.html
- https://www.speechly.com/
If you are looking for open source solutions for TTS & STT, then the following open source projects look promising:
- MaryTTS: http://mary.dfki.de/
- Kaldi Speech recognition - https://kaldi-asr.org/doc/
- SoX sound processing - http://sox.sourceforge.net/
- Open Source Wrapper API - https://github.com/codeforequity-at/botium-speech-processing
- NVIDIA TTS - https://github.com/NVIDIA/tacotron2
- NVIDIA Speech Synthesis - https://github.com/NVIDIA/waveglow
- Mozilla TTS - https://github.com/mozilla/TTS
- Mozilla Speech to Text - https://github.com/mozilla/DeepSpeech
- Facebook STT - https://github.com/flashlight/wav2letter
A few articles that explain the setup:
No comments:
Post a Comment