Streaming in Dash like ChatGPT

I have just added a new SSE component to the dash-extensions library. It was designed to provide an easy way to stream text to Dash applications, similar to how the UI of ChatGPT works. Here is a minimal example,

import os
from dash_extensions.enrich import DashProxy, html, dcc, Input, Output, State
from dash_extensions import SSE
from dash.exceptions import PreventUpdate
from dash_extensions.streaming import sse_options

from models import Query

app = DashProxy()
app.layout = html.Div(
    [
        dcc.Input(id="query", value="What is Plotly Dash?", type="text"),
        html.Button("Submit", id="submit"),
        dcc.Markdown(id="response", dangerously_allow_html=True, dedent=False),
        # Configure the SSE component to concatenate the stream values, and animate the result.
        SSE(id="sse", concat=True, animate_chunk=5, animate_delay=10),
    ]
)
# Expose server variable for gunicon to run the server.
server = app.server
# Configure the stream URL dependent on the environment.
stream_url = os.getenv("STREAM_URL", "http://127.0.0.1:8000/stream")


@app.callback([Output("sse", "url"), Output("sse", "options")], Input("submit", "n_clicks"), State("query", "value"))
def submit_query(n_clicks, query) -> tuple[str, dict[str, list[dict[str, str]]]]:
    if n_clicks is None:
        raise PreventUpdate
    # Create message structure to be consumed by the Azure OpenAI API.
    messages = [{"role": "user", "content": query}]
    # Pass the messages to the stream endpoint to trigger streaming.
    return stream_url, sse_options(payload=Query(messages=messages))


# Render (concatenated, animated) text from the SSE component.
app.clientside_callback("function(x){return x};", Output("response", "children"), Input("sse", "animation"))


if __name__ == "__main__":
    app.run_server()

which yields,

gpt

You can see the complete code, including the streaming server (also written in Python), along with dockerization of the complete solution here. You’ll need an Azure OpenAI model deployment to run the example.

The component has not yet been added to the official documentation, but I expect to add it soon :slight_smile:

7 Likes

Cool stuff :slight_smile: quick question, do server-sent events handle multiple users/sessions properly or would it send it to all connected users?

Each user/session is isolated. The current setup does not provide any built-in functionality to communicate between/to all users/sessions.

This is very cool, thank you for sharing @Emil!

Is there a reason why for the streaming server you use FastAPI rather than Flask? See e.g. GitHub - tieandrews/dashGPT: A high quality chat interface built entirely with Plotly Dash incorporating functionality for RAG, feedback and more. which seems to follow this sort of approach: python - How to implement server push in Flask framework? - Stack Overflow.

I guess that there’s maybe some scalability issues or similar with the pure-Flask approach but can’t find anywhere what the practical limitations actually are :thinking:

I used FastAPI as the streaming server as it provides proper async support. With Flask being sync, it means that each open stream will block a Flask instance. Hence with typical setups where you have a few workers, you’ll only be able to serve a single client or two before the application grinds to a hault. You can mitigate this to some degree by using (a lot of) threads, but it’s not ideal.

Reading up on the topic, I just became aware of the option use introduce async capabilities via gevent (and monkey patching). With this solution, I believe it would be possible to achieve scability much closer to FastAPI using Flask (i.e. the server already serving Dash). When I find the time, I’ll take a closer look and post an example here if it turns out to work well :slight_smile:

3 Likes

Thank you @Emil. What you say about sync vs. async is what I had guessed would be the case, but then I was surprised that no one who used Flask as the streaming server had previously discussed this limitation.

I was also wondering about doing this through gunicorn with --worker-class gevent (see Settings — Gunicorn 23.0.0 documentation). I’d be very curious what you find out but my (very uninformed) hope would be that Dash + gunicorn + gevent works similarly to Dash + FastAPI for the purposes of streaming.

Edit: I see that the gevent docs say “Some frameworks, such as gunicorn, handle monkey-patching for you. Check their documentation to be sure.”

I just played around with it for a bit. And it seems to work nicely (though I haven’t tested it at scale), with the main caveat that you can’t use the flask debug server. Instead, you’ll have to use e.g. gunicorn for development. I have updated the example on the docs to be a single self-contained service,

That change made it possible to embed the app (like the rest of the examples) in the main application, thus making the streaming interactive! Thanks for the hint @antony.milne :grinning_face:

1 Like

Super cool! I can imagine this would be quite useful. We’ve been exploring adding SSE support to core Dash for some time so I’m excited to test drive your implementation. Cheers!

1 Like

Very cool @Emil, thanks so much for figuring this out! I’m definitely going to play around with this some more.

I just tried running the simple example you linked to with the Flask server (i.e. just app.run()) and it worked ok as far as I could tell - what am I missing? :thinking: