Dash - MATCH and performance - Data Handling

Hi,

I am new to the community. This is the very first time I developed a dash and I am struggling with performance. I think t has to do with the logic of how I implemented my app. My app is pretty complex. It aims to use a dataset uploaded by a user, then based on a dropdown selection, user can select attributes. Once an attributes is selected, analysis will be run on the attribute and graphs will be created.

The bottleneck of my app is the data handling. Since it is highly interactive (user can add multiple attributes and filter by clicking on graphs). Each time a graphs needs to be created it is loading the data again even though I am using a redis Cache. I don’t know how avoid loading json data from my cache back to dataframe each time a graph needs to be created.
The callback which create the issue is the following one :

@dash.callback(
    Output({"type": "dynamic-graph", "attribute": MATCH, "graph_type": MATCH}, "figure"),
    [
        Input({"type": "dynamic-graph", "attribute": MATCH, "graph_type": MATCH}, "id"),
        Input("dropdown-selection-status", "data"),
        Input("layout-status", "data"),
        Input("filtered-data", "data"),
    ],
    prevent_initial_call=True,
)
def update_graphs(id, dropdown_status, layout_status, filtered_data_json):
    if not (layout_status and dropdown_status) or not (
        layout_status.get("is_complete") and dropdown_status.get("selection_updated")
    ):
        raise PreventUpdate

    attribute_name = id["attribute"]
    graph_type = id["graph_type"]

    filtered_data_json = UtilsDataManager.get_data("filtered_data")  # load data from redis cache
    filtered_data = pd.DataFrame(filtered_data_json)  # convert to pandas dataframe
    figure = UtilsDataManager.get_or_create_figure(
        attribute_name, graph_type, filtered_data
    )  # create figure or only update data from figure

    return figure if figure else dash.no_update

def get_or_create_figure(attribute_name, graph_type, filtered_data):
    dropdown_selection = get_data("dropdown_selection") or []
    if attribute_name not in dropdown_selection:
        return None

    graph_key = f"graph_{attribute_name}_{graph_type}"
    figure = get_data(graph_key)

    if figure is None:
        create_func = UtilsDashHelpers.get_graph_functions(graph_type, "create")
        figure = create_func(filtered_data, attribute_name)
    else:
        update_func = UtilsDashHelpers.get_graph_functions(graph_type, "update")
        figure = update_func(figure, filtered_data, attribute_name)

    set_data(graph_key, figure)
    return figure

GRAPH_TYPE_FUNCTIONS = {
    "completeness": {
        "create": UtilsGraphics.create_pie_chart_completude,
        "update": UtilsGraphics.update_pie_chart_completude,
    },
    "uniqueness": {
        "create": UtilsGraphics.create_pie_chart_uniqueness,
        "update": UtilsGraphics.update_pie_chart_uniqueness,
    },
    "validity": {"create": UtilsGraphics.create_pie_chart_valid, "update": UtilsGraphics.update_pie_chart_valid},
    "pattern": {"create": UtilsGraphics.create_bar_chart_pattern, "update": UtilsGraphics.update_bar_chart_pattern},
    "length": {"create": UtilsGraphics.create_bar_chart_length, "update": UtilsGraphics.update_bar_chart_length},
    "frequency": {
        "create": UtilsGraphics.create_bar_chart_frequency,
        "update": UtilsGraphics.update_bar_chart_frequency,
    },
}


def get_graph_functions(graph_type, type):
    graph_functions = GRAPH_TYPE_FUNCTIONS.get(graph_type)
    if not graph_functions:
        raise ValueError(f"Invalid graph_type: {graph_type}")
    return graph_functions[type]

This is the logic I am currently using. Each time callback is called, data a reload and converted to dataframe. Have you faced similar performance issues ? If Did you succeed in making it more efficient ?

Thank you for you thought or input about this.

Hello @og_gremlins,

Not sure whether it would perform better than Redis Cache (I do not know anything about them to be honest). However, have you considered separating the load data part of your callback into a separate call back with a dcc.Store component? In that way your callback would not need to load the given data every time it gets triggered… Just a thought…

Cheers!

Hello @og_gremlins,

Welcome to the community!

Using redis, it looks like you are already performing some sort of cache.

What exactly is the filtered-data? Is this a dcc.Store that you update when a selection is triggered?

I see that you take the input as filtered_data_json, but then update it using Redis, so this is kinda pointless. It seems like you are trying to perform some sort of chained callback, but possibly triggering it more often then you should.

To boost the redundancy of pinging the redis server, you could introduce flask-caching on the get_data function, but this might cause some issues in your case.


To do a chained callback, I’d do something like this:

user selected dropdown → calculations run, pass timestamp to dcc.Store → dcc.Store triggers graphs reloading

The resulting end graph callback would look something like this:

@dash.callback(
   Output({"type": "dynamic-graph", "attribute": MATCH, "graph_type": MATCH}, "figure"),
   [
       Input({"type": "dynamic-graph", "attribute": MATCH, "graph_type": MATCH}, "id"),
       Input("filtered-data", "data"),
   ],
   prevent_initial_call=True,
)

If your calculations take 10 seconds to run, then you’ll be able to cache the redis response for that amount of time safely, IMO.

To cache, you can do this:

from flask_caching import Cache
cache = Cache(config={"CACHE_TYPE": "SimpleCache"})
app = Flask(__name__)
cache.init_app(app, config={"CACHE_TYPE": "SimpleCache"})

@cache.memoize(10)
def get_data(type)

You’ll need to make sure you introduce your caching in the app in a beneficial way, check out more here:

Hello @jinnyzor,

To give you more context about my app, I had other few other callbacks. When user selects an attribute from the dropdown, two callback are being executed. One that will analyse the selected column and add more columns to the original dataset. Another one that will add components to the layout which will be used by the update_grahs callback. Each callback use a dcc store to signal when launch graphs creation.

I also have the callback that received click on displayed graphs. Once a graphs is click, all other data are filtered and thus all graphs are reloaded (only data behind the graphs is updated).

As you said, I am already using a redis cache (CACHE_TYPE=redis) . I only use get and set methods that I defined to retrieve cached data.

cache_config = {
   "CACHE_TYPE": CACHE_TYPE,
   "CACHE_REDIS_URL": CACHE_REDIS_URL,
}

For each selected attributes, 5 or 6 graphs are being created or updated. This is where the issue happen, for large dataset, each time update graph is being call it takes lot of time because it has to convert data to dataframe. I don’t know if there is a way to avoid this.

Thanks for your input and taking the time to answer, appreciate it. I will definitely keep trying to improve my app.

Assuming your df isnt going to be changing between all the graphs, you could turn that into a function and memoize the response?

aahhh well said, like defining simple function that return the pd.Dataframe(filtered_data_json) with memoize decorator and then call this function in my callback? I have to try it.

1 Like