Handling Slow or Crashing Dash Apps with Dynamic SQL Queries in Databricks

I’m trying to connect my Dash app to a Databricks SQL Warehouse, but I’m facing issues when executing SQL queries dynamically through callbacks. Whenever I input a new value to filter the data, the page either crashes or takes too long to load. I’ve tried optimizing the SQL query, but it’s still not working. How can I efficiently handle dynamic SQL queries in Dash without overloading the system or causing timeouts?

For reference, I’ve looked into (web.dash.plotly.com/databricks-integration/sql-connector), but the issue persists. Any advice on improving performance or optimizing queries would be really helpful!

Here’s my current code:

import dash
from dash import Dash, html, dcc, Output, Input, callback
from databricks import sql
import plotly.express as px
import os

app = Dash()
server = app.server

# Set these as environment variables in Dash Enterprise or locally
SERVER_HOSTNAME = os.getenv("SERVER_HOSTNAME")
HTTP_PATH = os.getenv("HTTP_PATH")
ACCESS_TOKEN = os.getenv("ACCESS_TOKEN")

DB_NAME = "plotly_iot_dashboard"
TABLE_NAME = "silver_users"

app.layout = html.Div(
    [
        dcc.Loading(dcc.Graph(id="sample-chart")),
        dcc.Input(id="val-selector", type="number", min=20, max=80, step=5, value=40),
    ],
    style={"background-color": "white", "height": "100vh"},
)

@callback(Output("sample-chart", "figure"), Input("val-selector", "value"))
def create_chart(selected_val):
    connection = sql.connect(
        server_hostname=SERVER_HOSTNAME, http_path=HTTP_PATH, access_token=ACCESS_TOKEN
    )
    cursor = connection.cursor()
    cursor.execute(
        f"SELECT * FROM {DB_NAME}.{TABLE_NAME} WHERE age > {selected_val} LIMIT 100"
    )
    df = cursor.fetchall_arrow()
    df = df.to_pandas()

    cursor.close()
    connection.close()

    return px.scatter(df, x="height", y="weight")

if __name__ == "__main__":
    app.run(debug=True)

Problem: The app works fine when first loaded, but as I adjust the filter (the input value), the queries seem to take forever to execute or the app crashes. What can I do to improve performance and avoid timeouts?

Hi @JackShaw,

Do you know how long your queries take on avg? Most web-server have a timeout of 30 secs. So. if your query takes longer then that, the server will terminate the connection. Does your app already crash in your local dev environment or only when deployed? You can set the timeout in Gunicorn for example.

If your queries are under 30 secs, you may want to check your memory usage, maybe the query result surpasses your server specs? Hard to say without some stats.

What I do in dashboards to improve performance is to cache n permutations based on the filters. But this wouldnt directly solve your problem when the query has a cache miss.

Hope this helps, if not try to provide some informations like memory size, dataframe dimension and execution time.

Here is a simple wrapper to track execution times:

def exec_timer(f):
    @functools.wraps(f)
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = f(*args, **kwargs)
        end_time = time.time()
        print(wrapper.__name__, datetime.now(), flush=True)
        print(f'Execution of function {wrapper.__name__} took {end_time - start_time} seconds.')
        return result
    return wrapper

@exec_timer 
def expensive_function():
    pass

you could a print to display the result memory size.

Kind regards,
Christian