Working principle of database connections and data refresh

Good day everyone,

I’d need some help in understanding a Dash mechanic.
I created a single app application, that has to be used by maximum 20 users and I already deployed the app on a web server.
I am currently reasoning on how the database connections work and how the data is updated.
Before defining the layout I create a sqlalchemy database connection and periodically, through a background process, this data is retrieved again, I am not using callbacks to do this but an apsscheduler.
What I am worried about is the fact that I don’t really understand how Dash manages this by default, thus it is not clear to me how the data is refreshed and how a connection’s session is restarted. Is a new session of a connection to a database opened everytime a user clicks refresh?
What happens if no one is connected?
Sorry for the very basic questions but I only have very basilar knowledge of web development and I am trying to understand better this concepts and to avoid adding useless overheads to my application

Good day! :wave:

Hm. What is the nature of your “background process”? I’m not saying it’s impossible, especially if you have it working… it’s just hard for me to see how you could update a component without a callback or it becoming a Rube Goldberg machine.

Maybe it’d help to show a minimum example of how I understand component updates. After we initialize an app, e.g. app = dash.Dash("My app"), app.layout will be returned to the user by app.server. We populate the layout with components, e.g. app.layout = dash.html.Div("howdy").

Let’s say I initialized a component with data retrieved from a DB in Python, and then update that data in Python:

import dash

# Pretend I had a DB connection here
# and populated the variable with retrieved initial data that happens to be a string.
my_value_from_db = "default value until data is retrieved"

app = dash.Dash("My app")

app.layout = dash.html.Div(my_value_from_db)

# we could wait however long here, such as a scheduler might

# now update the variable 
# (e.g. could be an updated version of my string data from the DB)
my_value_from_db = "updated data"

Now when I send a request to app.server, I will only ever get an HTML page with the text “default value until data is retrieved”. Updating a value (such as data) I created a component with in Python is not sufficient to update that component in the app.layout. That requires a callback (to go and update the component in the app).

My approach, if I wanted data in a component to update when the page is loaded or reloaded: I would use a dcc.Location component to trigger a DB connection and component update any time the page is loaded. Here’s a short example:

import dash

# Pretend I had started a DB connection here
# and populated the variable with retrieved initial data that happens to be a string.
my_value_from_db = "default value until data is retrieved"

app = dash.Dash("My app")

# Define the app layout
app.layout = dash.html.Div(
        [
            # The location component is going to trigger a data retrieval / update
            dash.dcc.Location(id='url', refresh=False),
            dash.html.Div(my_value_from_db, id="some_unique_id")
        ]
    )

@dash.callback(
    dash.Output('some_unique_id', 'children'),
    dash.Input('url', 'href')  # href updates even on first load
)
def update_data_on_refresh(pathname):
    # when href updates,
    # we return the new value of the Div's children
    # (e.g. could connect to the DB and get an updated version of my string data here)
    return "updated data"
    
# Be sure to run the server AFTER defining the callback!
app.run_server()

Note I believe that the “default data” would be shown until the callback could finish running and update the component.

Hope that helps! :space_invader:

Hello,
thanks for the reply, I did not know this approach, it is very useful.
I understand now that you need a callback to reconnect to the database and get the data, however
I am wondering if some sort of timeout in the connection could happen. I will share my code so that I can make myself clearier:

This is the setup, I learnt what caching is and tried to implement it, what I want is to have a persistent connection that every x minutes updates the data (also refreshing the data based on page refresh is fine, this was the first idea I got and I am worried I am adding too much overhead

import os
import sys
import dash_bootstrap_components as dbc
from apscheduler.schedulers.background import BackgroundScheduler
import dash
import pandas as pd
from dash import html, dcc, callback, Output, Input, ctx
from dash.dash_table import DataTable
from waitress import serve

# Set the server
server = Flask(__name__)

# Initialize the dash app object
app = dash.Dash(__name__,
                server=server,
                external_stylesheets=[dbc.themes.COSMO, dbc.icons.BOOTSTRAP],
                pages_folder="",
                meta_tags=[{"name": "viewport",
                            "content": "width=device-width, initial-scale=1"}],
                suppress_callback_exceptions=True,
                use_pages=False)

# Define caching
config = {
    "DEBUG": True,  # some Flask specific configs
    "CACHE_TYPE": "SimpleCache",  # Flask-Caching related configs
    "CACHE_DEFAULT_TIMEOUT": 300  # 5 minutes of caching
}

# Configure the caching into Flask
app.server.config.from_mapping(config)

# Define the cache object
cache = Cache(app.server)

# Define app's title
app.title = 'App title'

# Create a connection to the db
db_connection = get_db_connection()

This is my ‘background cache update’:

# ----------------------- SET UP CONSTANTS AND DATA DEFINED AT THE APPLICATION START ---------------------
def load_data(connection):
    """This function extracts the data from the database"""

    # Extract materials data info
    materials_data = connection.execute_pandas_query(query_select_materials)

    # Extract the list of unique businesses
    businesses_list = connection.execute_pandas_query(query_select_business)
    businesses_list = list(businesses_list.iloc[:, 0].unique())

    # Extract a list of selectable materials
    materials_list = list(materials_data.iloc[:, 0].unique())

    # Extract a list of selectable material batches
    materials_batches = list(materials_data.iloc[:, 1].unique())

    # Get the data columns
    data_columns = connection.execute_pandas_query(query_column_names).iloc[:, 0].to_list()

    # Initialize an empty dataframe with same columns as in the db
    empty_data = pd.DataFrame(columns=data_columns)

    return (materials_list.copy(), materials_batches.copy(),
            empty_data.copy(), materials_data.copy(), businesses_list.copy())


def update_cache(connection):
    """Function to update the cached data"""
    # If the cache is empty then update the data
    if cache.get('data') is None:
        # Load the data from the db
        mat_list, mat_batches, empty_data_table, data_table, business_list = load_data(connection)

        # Cache the data
        cache.set('data', data_table)
        cache.set('empty_data', empty_data_table)
        cache.set('materials_list', mat_list)
        cache.set('materials_batches', mat_batches)
        cache.set('business_list', business_list)
        print("Cache updated.")


# Job scheduler configuration: in background the cache update is performed every 4 minutes
scheduler = BackgroundScheduler()
scheduler.add_job(func=lambda: update_cache(db_connection), trigger="interval", minutes=4)
scheduler.start()

# Populate the cache before starting the app
update_cache(db_connection)

And then my layout and callbacks are defined, I pass to my components the cached data, like:
dcc.dropdown(cache.get(‘cached list’))

I really don’t know if this is too much / an anti-pattern

Is a persistent DB connection necessary?

If you’re okay with connecting to the DB each time you want it to update, you could have a dcc.Interval component’s n_interval property trigger the update_data callback rather than (or in addition to) the dcc.Location’s href. That update callback could connect to the DB, update, and then close the connection to the DB.

1 Like

I like this approach, yesterday I finished reading the whole user guide’s documentation and came across these componente you tougut me, so now everything Is clearier. To answer your question: I am not sure if I want a persistent db connection, I tried two ways:
1- the one listed above in which the connection Is reused across callbacks to get the data needed each time and leave to my Sql database the heavylifting of the tables computation

2- open a connection in each callbacks that needs the sql Data.

Some days ago I found the second approach slower, I think It Is due to the latency introduced by the connection opening and closing, however I don’t know if there’s a best practice :confused: