How to fetch real time sql data without refreshing page

Hi Dash Community

I’m trying to build an app which currently connects with the sql server database (only available on my company’s internet), then doing some calculations and displaying the graphs created from it on my Dash app.
I’m able to create a cool dashboard, but i’m still not able to know how to update the data of the graph after every 30 seconds without refreshing the page. Also i tried using the method of storing data in dcc.div and dcc.store but it did’nt work as my dataframe was too large. Please someone help me.

Main challenge I’m facing is the live updation of my graph data as you can see in the last callback all the filteration on the basis of dropdowns is being done and the grahs gets updated on the basis of that data. The filtered data can still be of 100000 lines. The process is slow too sometimes but that’s the secondary issue . Primary one is that my graph data must get updated every 30 seconds.
That means whole dataframe must get updated every 30 seconds.
Explaining further my SQL database is fetching data from a machine every 30 seconds. I wanna show that update on my graph alongwith the previous values of the same date.
In my original data I’m updating 10 graphs from that data. Also some text data is also being displayed in that original app which i want to get updated every 30 seconds.

While using dcc.interval it’s updating the graphs after every 30 seconds but the data remains same. The latest data is not being fetched from the source.

So in simple words, the issue I’m facing is I’m not able to refresh my data displayed by my graphs in real time. The data is being fetched from a sql database.

import dash
import dash_core_components as dcc
import dash_html_components as html
import pandas as pd
import pyodbc
import plotly.graph_objects as go
from dash.dependencies import Input, Output
from plotly.subplots import make_subplots
import flask
from flask_caching import Cache
import os
import numpy as np

server = flask.Flask(__name__)
app = dash.Dash(__name__, server=server)

cache = Cache(app.server, config={
    'CACHE_TYPE': 'redis',
    'CACHE_REDIS_URL': os.environ.get('REDIS_URL', '')
})
app.config.suppress_callback_exceptions = True

TIMEOUT = 30

@cache.memoize(timeout=TIMEOUT)
def query_data():
    driver = "SQL Server"
    server = "***.**.146.96\SQLEXPRESS"
    database = "***IIOT"
    username = "IIOTUser"
    password = "****#13579"

    # Create the connection
    conn = pyodbc.connect(
        "driver=" + driver + ";server=" + server + ";uid=" + username + ";pwd=" + password + ";database=" + database)

    # query db
    sql = """

    SELECT * FROM Ichinose1

    """
    df = pd.read_sql(sql, conn)

    #my real code contains 10 lines of codes for initial calculations which takes more time
    df.fillna(0, inplace=True)
    df['Date_Time'] = pd.to_datetime(df['shiftDate'].str[:10] + " " + df['shiftTime'].str[:8])
    df['Run_Category'] = np.where(df['TB'] == 1, "Trial ", np.where(df['TB'] == 2, "Bulk ", np.nan))
    return df #i chose returning original dataframe as in json form it was showing memory error


def dataframe():
    return query_data()

df = dataframe() # this was done because
                 # if i was putting query_data() in all
                 #the three callbacks then it was taking alot of time to load

app.layout = html.Div([
        dcc.Interval(id='refresh', interval=30000),
        dcc.Dropdown(id="Date",
                     options=[{"label": i, "value": i} for i in df['shiftDate'].unique()

                     ],
                     multi=False,
                     ),
        dcc.Dropdown(id="select_PO",
                     multi=False,
                     ),
        dcc.Dropdown(id="select_RunCategory",
                     multi=False,
                     ),
        dcc.Graph(id='Speed_Length_Graph', figure={}),

])

@app.callback([Output(component_id="select_PO", component_property="options"),
               ],
              [Input(component_id="Date", component_property="value")
               ])
def update_dropdown(Date_value):
    if Date_value is not None:
        return_PO = df.loc[df['shiftDate'] == Date_value]['ProductionOrder'].unique()
    else:
        return_PO = df['ProductionOrder'].unique()
    return [[{"label": i, "value": i} for i in return_PO]]


@app.callback(
    [
        Output(component_id="select_RunCategory", component_property="options")],
    [
        Input(component_id="Date", component_property="value"),
        Input(component_id="select_PO", component_property="value"),
    ]
)
def update_run_category(Date_value, PO_Value):
    if Date_value is not None:
        if PO_Value is not None:
            return_run_category = df.loc[(df['shiftDate'] == Date_value) & (df['ProductionOrder'] == PO_Value)]['Run_Category'].unique()
        else:
            return_run_category = ["Please select a P.O."]
    else:
        if PO_Value is not None:
            return_run_category = df.loc[df['ProductionOrder'] == PO_Value]['Run_Category'].unique()
        else:
            return_run_category = ["Please select a P.O."]
    return [[{"label": i, "value": i} for i in return_run_category]]


@app.callback(
    [Output(component_id="Speed_Length_Graph", component_property="figure"),
     ],
    [
     Input(component_id="Date", component_property="value"),
     Input(component_id="select_PO", component_property="value"),
     Input(component_id="select_RunCategory", component_property="value"),
     Input(component_id="refresh", component_property="n_intervals")
     ])
def update_graph(Date_value, PO_Number, Run_Category, n):
    
    # To filter df w.r.t. dropdown values
    if Date_value is not None:
        if PO_Number is not None:
            if Run_Category is not None:
                df1 = df.loc[(df['shiftDate'] == Date_value) & (df['ProductionOrder'] == PO_Number) & (
                            df['Run_Category'] == Run_Category)]
            else:
                df1 = df.loc[(df['shiftDate'] == Date_value) & (df['ProductionOrder'] == PO_Number)]
        else:
            df1 = df.loc[df['shiftDate'] == Date_value]
    else:
        if PO_Number is not None:
            if Run_Category is not None:
                df1 = df.loc[(df['ProductionOrder'] == PO_Number) & (df['Run_Category'] == Run_Category)]
            else:
                df1 = df.loc[df['ProductionOrder'] == PO_Number]
        else:
            df1 = df.loc[df['shiftDate'] == df.loc[df.last_valid_index(), 'shiftDate']]


    fig = make_subplots(
        specs=[[{"secondary_y": True}]]
    )

    fig.add_trace(
        go.Scattergl(x=df1['Date_Time'].astype(str), y=df1['FabricSpeed'], name='Fabric Speed',
                     ), secondary_y=False,
    )

    fig.add_trace(
        go.Scattergl(x=df1['Date_Time'].astype(str), y=df1['POLength'], name='Fabric Length',
                     ), secondary_y=True,
    )

    fig.update_layout(
        title_text='Speed & Length of P.O. w.r.t. Date-Time',
    )
    return fig

if __name__ == '__main__':
    app.run_server(debug=True)

HI @bsbalkar
Your df = dataframe() function is outside any callback functions. That is why your SQL data never gets triggered and is only refreshed when your page loads. Try putting the df = dataframe() inside your callback like this.

@app.callback(
    [Output(component_id="Speed_Length_Graph", component_property="figure"),
     ],
    [
     Input(component_id="Date", component_property="value"),
     Input(component_id="select_PO", component_property="value"),
     Input(component_id="select_RunCategory", component_property="value"),
     Input(component_id="refresh", component_property="n_intervals")
     ])
def update_graph(Date_value, PO_Number, Run_Category, n):
    
    # To filter df w.r.t. dropdown values
    if Date_value is not None:
        if PO_Number is not None:
            if Run_Category is not None:
                df = dataframe()
                .....

You don’t even need the interval in the callback above. Maybe you can try doing a separate callback just for the interval and sql data update, and store it in dcc.Store(). Then you can use your data from Store in your Dropdown callbacks:

@app.callback(
    Output(store, 'data'),
    [Input(component_id="refresh", component_property="n_intervals")]
)
def update_graph(n_intervals):
    df = dataframe()
    return df

Don’t forget to include the dcc.Store(id='memory') inside the app.layout() section.
You can read more about dcc.Store here.

P.S. you might encounter issues with data refreshing every 30 seconds because you don’t want the dropdown data options to change while users are choosing values in the dropdowns.

Hi @adamschroeder
Thanks for your reply.

This way dcc.interval will update the data really well. but then the problem is with the dcc.store.

That way it’ll work but the problem is that my data refreshes after every 30 seconds.
My dataframe sontains 30 columns.

So my maximum no. of rows for a day can be 24 * 60 * 2 = 2880 for a day.

And if the user wants to display a month’s data in one go then the max no. of rows can be of 2880 * 31 = 89280.

So my question is that will the dcc.store be able to store a dataframe of size 89280 * 30 in it?

I read that redis is storing the data in the cache form in the system. So whenever the data gets updated by the dcc.interval, is there any way to store that data to the system cache wherever and then fetching it from it?

I hope I was able to explain and may get some help in it.

Thanks in advance :slight_smile:

Redis can definitely be a good option. I’m not sure of the size of data that dcc.Store() can store.

Hi @adamschroeder
Do you know then how will i be able to store that data on redis ?

No, sorry. I don’t have experience with redis.

It’s ok @adamschroeder, don’t be sorry. You have already helped alot.

You could try my serverside cache. Per default it writes the data to files, so there is no immediate size limit. Here is an example of the usage,

1 Like

Hi @Emil
Thanks for your reply. In your example too you’re storing the data in the dcc store. My question is that the dcc store will be able to store the whole amount of data that i want to store?
Refer to the link below for more information

In the example i am using a custom cached_callback decorator (instead of the normal callback decorator). The effect of this decorator is that only a reference to the data will be stored in the Store component. The actual data are stored in the cache, which in the example is a file on disk on the server. As a result, you will not be limited by the amount of data data the Store component can handle. As a bonus, you should see a significant performance improvement when accessing the data in server side callbacks, i.e. the ones written in python. Note however that client side callback will not be able to access the data.

Hi @Emil
In my app there is’nt any dropdown options to provide as an input to the cached_callback.

So instead of using this below:

@cc.cached_callback(Output("raw_store", "data"), [Input("data_dd", "value")])
def fetch_data(key):
    df = plotly.data._get_dataset(key) if key is not None else None
    time.sleep(1)  # sleep to emulate a database call / a long calculation
    return df

I tried using the dcc interval as an input to the cached callback.
and i’m directly returning df in it without using this code “df = plotly.data._get_dataset(key) if key is not None else None”

Let me show you the code below-

@cc.cached_callback(Output("raw_store", "data"), [Input("interval_component", "n_intervals")])
def fetch_data(key):
    time.sleep(1)  # sleep to emulate a database call / a long calculation
    return df

And i’m using raw_store only. I’m not using two dcc store(i.e. filtered_store in your case).

Same interval component and the data of raw_store had been applied to the dynamic dropdowns graphs which will receive the data from “raw_store”.

Problem is that the dropdowns and graphs are not receiving data in the valid form. Also it also shown a memory error cannot allocate 65.7 mb of data.

Also I can only access sql server from my office internet at around this time so i won’t be able to test it while you reply me. Sorry for the inconvenience from my side but this will take alot of time to run it after you suggest me the solution.
But the

I tried using the code below and i was able to see the graphs on my screen.
In this I had fetched the sql data in the cc.cached_callback and did it’s all the data cleaning in that function only.
I had to provide something as an input so i had used start date of data-range-picker which is not been used in the function anywhere.

@cc.cached_callback(Output("raw_store", "data"), [Input("date_range_picker", "start_date")])
def fetch_data(s):
    """
    all the parameters to fetch data from sql server to data cleaning of the data.
    """
    time.sleep(1)  # sleep to emulate a database call / a long calculation
    return df # returning the dataframe after fetching it from sql and data cleaning of it

Still I saw this error while updating my dynamic dropdowns. Pls help

Invalid argument `options[0]` passed into Dropdown with ID "select_RunCategory".
Expected `object`.
Was supplied type `array`.

As i read the previous posts, you intend to update the data at regular intervals, e.g. 30 seconds. In this case, the input to the fetch_data callback should be the interval component.

The error you see seems to be related to the options of the drop down being set to a wrong type (array). However, without seeing the actual code, it’s hard to pin down exactly what goes wrong.

Hi @Emil
Thanks for your reply.
Here’s the code of one of my dynamic dropdown which is getting input from a date-picker-range, another dropdown and the dcc store named “raw_store”.

@cc.callback([Output(component_id="select_PO", component_property="options"),
               ],
              [Input(component_id="select_date_range", component_property="start_date"),
               Input(component_id="select_date_range", component_property="end_date"),
               Input(component_id="Shift", component_property="value"),
               Input(component_id="raw_store", component_property="data")
               ])
def update_dropdown(start_date, end_date, Shift_value, df):
    #df = dataframe()
    if all([start_date is not None, end_date is not None]) is True:
        Date_value = pd.date_range(start_date, end_date).strftime('%Y-%m-%d')
    else:
        if all([start_date is None, end_date is not None]) is True:
            Date_value = [end_date]
        elif all([end_date is None, start_date is not None]) is True:
            Date_value = [start_date]
        else:
            Date_value = None
    if Date_value is not None:
        if Shift_value is not None:
            return_PO = df.loc[(df['cdate'].isin(Date_value)) & (df['shift'] == Shift_value)]['Production_Order'].unique()
        else:
            return_PO = df.loc[df['cdate'].isin(Date_value)]['Production_Order'].unique()
    else:
        if Shift_value is not None:
            return_PO = df.loc[df['shift'] == Shift_value]['Production_Order'].unique()
        else:
            return_PO = df['Production_Order'].unique()
    return [[{"label": i, "value": i} for i in return_PO]]

Since you have only a single output, i would suggest removing the brackets around the output, i.e. write it as

Output(component_id="select_PO", component_property="options"),

as well as the return statement, i.e. write it as

[{"label": i, "value": i} for i in return_PO]

Hi @Emil

Thanks for your help. The dropdowns are working correctly and i’m able to get data on the graphs too.

Now I’m facing the below problems:-

1
Whenever i’m using dcc interval the dcc store “raw_data” shows an error. It shows that there was a problem updating it. I tried increasing the time from 30 seconds to 60 seconds too but it was still showing error. Maybe it means i have to increase the interval time. But i want my data to be updated after every 30 seconds.

2
Whenever i select a value in date picker range or dropdowns the page tooks too long to update.

3
The graph is not showing the live data. It’s not refreshing the value after every 30 seconds. I tried using dcc interval too but as said above it’s not working properly.

Pls help
Thanks in advance

Hi @Emil

I just want to inform you that everything working fine if I’m not using dcc interval in cached_callback. But the I’m not able to get the real time data on graph.

So i tried using dcc interval and it shows me an error. I’m attaching the pic of that error in this reply.
I hope you’ll help me.

Hi @Emil
Thanks for your help. Your server side cache is awesome. It has also increased my speed of selection of data with dropdowns too. I found the actual problem in my case was the slow speed of fetching data from sql. Also the further calculation takes time some time too. That’s why i was not getting desired result with your server side cache.

Now i had reduced my code of calculations and server side cache is being updated every 30 seconds. I’m fetching and updating my data separately with a while True infinite loop. Though I’m still getting live data with a delay of 1.5 to 2 mins but my graphs and data gets updated after 30 mins.

That means, my graphs are getting updated every 30 seconds as desired but the base data from which i’m getting input is always showing data of 2 mins before. I’m fetching data with pyodbc. I need to look for a faster way to fetch it to reduce this delay to upto 30 to 60 seconds.

Thanks for you help @Emil

1 Like