Dash microphone for audio (STT engine comparison)

Going to preface this by saying: LLMs are cool and all, no doubt, but… well… not that cool. What gets me really excited is good (especially multilingual!) speech-to-text engines, otherwise known as ASR (automatic speech recognition). So that’s where this is going.

So a few weeks ago, I spent a weekend building out a microphone component for Dash. I tried this 2 years ago, when I didn’t know much JS, and failed miserably; 2 years later, I actually managed to do it :tada:

Out-of-the-box, this component will record audio and allow you to download the mp3 file (the “download” button and audio controls show up dynamically once there is audio there – feel free to judge my design choices, I’m not fussed ). If you add a @server.route to your app, you can then send this audio wherever you want… for example, to an STT API endpoint.

But why send it to 1, when you can send it to 4 (and probably someday more) and benchmark the results in real time? While recording the ground truth, the speaker, and any comments about that “shot” that you want to make? OF COURSE that’s what we’d do. (I like to benchmark, what can I say…)

This is what it looks like in the basic, not-really-very-styled form (also my first foray into Dash AG Grid, so that was exciting as well).

demoSTTcompare

It currently has functionality built in to test Whisper, Azure STT, AssemblyAI, and Deepgram (supposedly the best out there). NOTE THAT IT’S TESTING ASYNC, not streaming, but I’m cool with that for now.

Open to comments, suggestions, feedback, etc., even though I only make time to work on this very sporadically (once every few weeks). Also very open to throwing this up online for folks to test out – I run it locally with an env file for the API keys, but I’ve also built in a panel to put in API keys through the frontend, so it’s actually ready if other people are interested in playing around with it.

3 Likes

Ah, right, also: the code for the microphone component is on github here:

with very minimal app code and instructions on how to run it. Haven’t really touched it since I got it working, so if someone who is better than me at component design wants to mark it up and tell me everything I did wrong, please do :laughing:

1 Like

Hey! Love this project, awesome component, you should upload it to pypi.org so its a pip project without the: pip install micro-0.0.1.tar.gz so users can just pip install dash-micro regardless, you broke new ground. This is the first audio component ive seen with dash :sunglasses::100:

1 Like

Thanks! On my to-do list to upload it to pypi, need to write some unit tests (any unit tests) first…

I guess @danielcahall beat me to it with the audio component (gotta give credit where it’s due :wink: ) but it’s implemented a bit differently, since I didn’t know that until I’d already done this :laughing:. And more choice is always a good thing!

1 Like

@tbonethemighty thank you! Neat to see that someone else tackled this as well :slight_smile: I had to build the component for a similar use-case actually - retrieving audio from the client to supply to an ASR/STT engine and provide as input to a conversational agent.

For anyone interested in comparing, my approach can be found here.

3 Likes

hi @tbonethemighty
Perfect timing :slight_smile:

I was just thinking about this the other day. Someone reached out to me with the goal of creating a Dash app that can record their audio and transcribe the recording to text, like you show in your gif.

Are you able to share the code for that app that you created?

I would second what @PipInstallPython said, whenever you get a chance to add this to pypi.org that would be great since it would open up access to many many more community members. Thank you for working on this.

Sure, no problem, but actually the ReadMe section in the github repo I linked earlier has a simple app for how to run this component with a callback – I assume this person has a function to transcribe for a particular service already…? OpenAI, Azure, AWS, whatever their tool of choice is. So all they would have to do is just pop that transcription API call into the callback.

Here is the full code for the minimal app:

import micro
from dash import Dash, callback, html, Input, Output, State

import flask
from flask import request
from pathlib import Path
import os
from werkzeug.datastructures import FileStorage


app = Dash(__name__)
server = app.server

app.suppress_callback_exceptions=True

# Define a route for POST requests to '/upload_audio' if you want to send the file somewhere for processing.
# Note that this is NOT REQUIRED if all you want to do is record and download the audio on a local client machine.
@server.route('/upload_audio', methods=['POST'])
def handle_upload():
    # print("file upload triggered!")
    if 'audioFile' not in request.files:
        return flask.jsonify({'error': 'No file part'}), 400
    
    file = request.files['audioFile']

    if file.filename == '':
        return flask.jsonify({'error': 'No selected file'}), 400
    if file:

        # Assume 'file' is your FileStorage object from the POST-ed file
        directory = '\\tmpfiles'
        os.listdir(directory)
        filename = file.filename
        file_path = os.path.join(directory, filename)

        # Check if the directory exists and create it if not
        if not os.path.exists(directory):
            os.makedirs(directory)

        # Check if file exists and remove it to ensure overwrite -- app was originally not overwriting the existing file
        if os.path.exists(file_path):
            os.remove(file_path)

        file.save(file_path)
        # print("returning saved file_path")

        return flask.jsonify({'message': 'File uploaded successfully', "fileLoc": file_path}), 200


app.layout = html.Div([
    html.Div([
        micro.Micro(
            id='audioInput'
        ),
        html.Div(id='output'),
    ],
        style={"width": "20rem"}
    )
])


@callback(Output('output', 'children'), Input('audioInput', 'fileUrl'))
def display_output(value):

    if value is not None:

        # do something with the file here, e.g. send it to a transcription API

        return html.Div([
            html.P("You have saved the file at {}; use this fileUrl as the input to other functions.".format(value)),
        ])

if __name__ == '__main__':
    app.run_server(debug=True) 

They’re also more than welcome to reach out to me if they have questions :slight_smile:

Will try to find time this weekend to add a unit test or two, push the component to pypi, and maybe host this app online so folks can play around with it.

A word of warning and to set expectations:
This component works async – you press the start and stop buttons on it , and only after the stop is the full audio posted to the server so it can be further processed. It’s not designed for real-time streaming audio, so it may not be suitable for something like a live conversational interface. (Full disclosure: my day job is a customer implementation/integration engineer for a company that builds automated voice agents. If I were building one of those, I wouldn’t do it this way :wink: ). Just want to make that limitation clear…

2 Likes

Thank you for the code @tbonethemighty and for pushing it to pypi soon.