Going to preface this by saying: LLMs are cool and all, no doubt, but… well… not that cool. What gets me really excited is good (especially multilingual!) speech-to-text engines, otherwise known as ASR (automatic speech recognition). So that’s where this is going.
So a few weeks ago, I spent a weekend building out a microphone component for Dash. I tried this 2 years ago, when I didn’t know much JS, and failed miserably; 2 years later, I actually managed to do it
Out-of-the-box, this component will record audio and allow you to download the mp3 file (the “download” button and audio controls show up dynamically once there is audio there – feel free to judge my design choices, I’m not fussed ). If you add a @server.route
to your app, you can then send this audio wherever you want… for example, to an STT API endpoint.
But why send it to 1, when you can send it to 4 (and probably someday more) and benchmark the results in real time? While recording the ground truth, the speaker, and any comments about that “shot” that you want to make? OF COURSE that’s what we’d do. (I like to benchmark, what can I say…)
This is what it looks like in the basic, not-really-very-styled form (also my first foray into Dash AG Grid, so that was exciting as well).
It currently has functionality built in to test Whisper, Azure STT, AssemblyAI, and Deepgram (supposedly the best out there). NOTE THAT IT’S TESTING ASYNC, not streaming, but I’m cool with that for now.
Open to comments, suggestions, feedback, etc., even though I only make time to work on this very sporadically (once every few weeks). Also very open to throwing this up online for folks to test out – I run it locally with an env file for the API keys, but I’ve also built in a panel to put in API keys through the frontend, so it’s actually ready if other people are interested in playing around with it.