Choroplethmapbox update _very_ slow when deployed

Hi everyone,
I have written a dashboard to explore COVID-19 data, the code can be found here: [GitHub - matteomancini/covid19ita-dash: Dashboard for monitoring COVID-19 spread in Italy using Python and Dash.].(GitHub - matteomancini/covid19ita-dash: Dashboard for monitoring COVID-19 spread in Italy using Python and Dash.). Briefly, I use a choroplethmapbox to show the number of cases in Italy and with some controls one can look at a different date or plot daily changes. When I run it locally or in JupyterLab, everything is fine. However, when deployed on Heroku or on AWS, updating the map takes between 45 and 55 seconds (!), while locally it takes a couple of seconds. The function that takes care of the update is the following:

@app.callback(
    [Output('prov-choropleth', 'figure'),
     Output('table', 'data')],
    [Input('date-slider', 'value'),
     Input('selectmeasure', 'value')],
    [State('prov-choropleth', 'figure')])
def update_figure(selected_date, measure, figure):
    filtered_df = data[data.date_index == selected_date]
    z_min = filtered_df[filtered_df.Provincia != temp_label][measure].min()
    z_max = filtered_df[filtered_df.Provincia != temp_label][measure].max()
    table_data = filtered_df.to_dict('rows')

    if figure['data']:
        figure['data'][0]['z'] = filtered_df[measure]
        figure['data'][0]['zmin'] = z_min
        figure['data'][0]['zmax'] = z_max
        figure['data'][0]['name'] = measure

    else:
        trace = go.Choroplethmapbox(z=filtered_df[measure], geojson=counties, locations=filtered_df['codice_provincia'],
                                    featureidkey='properties.prov_istat_code_num', zmax=z_max, zmin=z_min, name=measure,
                                    hovertemplate='%{text}<extra>%{z}</extra>',
                                    text=filtered_df['Provincia'] + "<br>" + filtered_df['Data'])

        figure['data'] = [trace]

    return [figure, table_data]

Basically, the first time that is called it generates the map using go.Choroplethmapbox, and after that it just changes the z values in the existing figure.

I have tested a few things to try to figure out how to fix the problem:

  • there is another function callback in the code, and I tried to completely remove it to see if things would improve (they didn’t);
  • I tried to strip the function of most of its operations, and the bottleneck is specifically the map update (as one could expect);
  • I tried to deploy the app “locally” on two different raspberry pi devices: the first one, RPi-zero with a single core processor, would take 40 seconds each time to update the map (still less than Heroku and AWS); the second one, RPi3 with a quad-core processor, would take 15 seconds;
  • On what observed from the “local” deployment, I tried a different instance on AWS: instead of a minimal t2.micro, I tried first a t3a.nano and then a t3a.xlarge (!!!), but same result (45-50 seconds).
    At this point, I’m not sure what should I try to make the update times reasonable. I’ve looked at dashboards with maps but I could not find anyone actually updating a Choroplethmap. Is there anyone else experienced something similar? I would be very happy to try other ways to implement the map update if anyone has any ideas.
    Thanks in advance.

A brief update on this: as a double-check I actually timed the function for updating the map (computing time.perf_counter() at the beginning and at the end, and return their difference). It turns out that the actual execution time is less than a second (!!!). Is it then (1) the callback mechanism or (2) the rendering that is taking so long?

Hi @matman welcome to the forum! Given the profiling / debugging which you have already done, it is possible that the time is due to the exchange between the Python server on heroku and the client in your browser. Did you check the network tab of your browser developer tools to see if a lot of time is spent in network exchanges? This can happen in particular if your geojson file is large. If this is the case you have several possibilities: either try to simplify the geometry of the geojson file (dropping out details) or switch to a clientside callback in order to remove the exchange of information between client and server. See https://dash.plotly.com/performance for more information about performance.

Thank you for the reply, Emmanuelle. I have tried to look at what was happening from the browser perspective, but recording the event timeline on Safari showed that basically most of the time is wasted waiting for network request replies. I looked more carefully at the time intervals when actually using the callback function on the local server: there is a 5 seconds delay between the moment I move the slider and the actual entry point of the function, then everything is done in less than a second and then there are 10 seconds before the map is actually updated.

This is the geojson file I’m dealing with. I don’t have previous experience with geojson, but it doesn’t look too terrible to me, also because what I need to update is just the associated z value, and I assumed that not loading different geojson data was actually a lighter task…

I tried to follow your advice and implement a clientside callback. This is how the code looks like:

@app.callback(
    [Output('map_data', 'children'),
     Output('map_layout', 'children'),
     Output('table', 'data')],
    [Input('date-slider', 'value'),
     Input('selectmeasure', 'value')],
    [State('prov-choropleth', 'figure')])
def update_figure(selected_date, measure, figure):
    filtered_df = data[data.date_index == selected_date]
    z_min = filtered_df[filtered_df.Provincia != temp_label][measure].min()
    z_max = filtered_df[filtered_df.Provincia != temp_label][measure].max()
    table_data = filtered_df.to_dict('rows')

    if figure['data']:
        figure['data'][0]['z'] = filtered_df[measure]
        figure['data'][0]['zmin'] = z_min
        figure['data'][0]['zmax'] = z_max
        figure['data'][0]['name'] = measure

    else:
        trace = go.Choroplethmapbox(z=filtered_df[measure], geojson=counties, locations=filtered_df['codice_provincia'],
                                    featureidkey='properties.prov_istat_code_num', zmax=z_max, zmin=z_min, name=measure,
                                    hovertemplate='%{text}<extra>%{z}</extra>',
                                    text=filtered_df['Provincia'] + "<br>" + filtered_df['Data'])

        figure['data'] = [trace]

    return [json.dumps(figure['data'], cls=PlotlyJSONEncoder),
            json.dumps(figure['layout'], cls=PlotlyJSONEncoder), table_data]


app.clientside_callback(
    """
    function(data, layout) {
        layout.style='carto-positron';
        return {
            'data': JSON.parse(data),
            'layout': JSON.parse(layout)
        }
    }
    """,
    Output('prov-choropleth', 'figure'),
    [Input('map_data', 'children'),
     Input('map_layout', 'children')]
)

What I was trying to do was to keep the dataframe processing in Python (since it takes less than one second) and implement client-side the actual map updating. I suspect it is not working as I expected, since it is still functional, but on the local server took 14 seconds to update the map and on AWS took 49 seconds… Is redirecting the data to a hidden div heavier than I’m thinking? :thinking:

I just wanted to update here saying that I did not manage to solve this and in the end I switched for a scatter mapbox representation, that is lighter (although I believe less effective).

I would still like to know what are the requirements to dynamically update a Choroplethmapbox in reasonable times: is it an issue of dedicated CPU time? Of network transfer? The AWS instance with highest performances I tried had 4 cores ( Intel Xeon Platinum 8000 ), 16 GB of RAM (although I don’t think it matters here) and up to 5 Gbps of transfer capabilities, and the update time for the Choroplethmapbox was still ~45 seconds. The geoJSON file has a size of 5.8MB, and even in the scenario that it needs to be transferred everytime, 45 seconds are still a lot to me, also because things did not get better when using a client-side callback.
Thank you in advance!

Just for everyone who drops in late.

I was also playing with Dash and the Choroplethmapbox for a COVID19 Dashboard of a research project lately. Reading this notes from @matman helped a lot to think about optimization from the beginning. Thanks here for opening this thread.

Our dashboard feels fast enough and can be seen here: https://covid19-bayesian.fz-juelich.de
Of course I tried to keep the shape-file as small as possible, but for us the solution was to use Flask_Caching to avoid rebuild of the figures on every single call of a user.

You can check the code here: