Black Lives Matter. Please consider donating to Black Girls Code today.
Learn how to use COVID-19 data in open source Dash apps. Register for the Sept 23rd webinar with IQT!

Modify a dataframe across uwsgi workers

Hi all,

Some context first:

I’m using Dash to make a simple annotation app. The user is presented with the contours of bacteria images.

Upon clicking on a contour, it assigns it the currently selected phenotype. This is stored as a phenoype column in a pandas dataframe.
Once the contour is clicked, all the contours are re-drawn, changing color based on that phenotype value.

The issue

This works fine locally, but in a container with nginx+flask+uwsgi, I have multiple processes (uswgi workers) between which the dataframe is not shared, so the modification is only effective for the process that received the callback.

The question

How can I make sure that my data is correctly synchronized between processes in the app?

Hope I’m clear,

Thanks for the good work!

Great question, and really sweet app! Would love to see this if it’s publically available :slight_smile:

So in Dash, it is not safe to modify any variable that is outside of the scope of the function because, as you mention:

Instead, you’ll have to store this data in an intermediate component or perform the computation on every callback (add the click event to every callback).

Here’s a simple example showing how to store data in an intermediate component (a hidden div):

import dash
from dash.dependencies import Input, Output
import dash_html_components as html
import dash_core_components as dcc

import pandas as pd

df = pd.DataFrame({
    'x': [1, 2, 3, 4, 5, 6],
    'y': [3, 1, 2, 3, 5, 6],
    'z': ['A', 'A', 'B', 'B', 'C', 'C']
})

app = dash.Dash()

app.layout = html.Div([
    dcc.Dropdown(
        id='dropdown',
        options=[{'label': i, 'value': i} for i in ['A', 'B', 'C']],
        value='A'
    ),

    dcc.Graph(
        id='graph'
    ),

    html.Div(id='cache', style={'display': 'none'})
])


@app.callback(Output('cache', 'children'), [Input('dropdown', 'value')])
def update_cache(value):
    filtered_df = df[df['z'] == value]
    return filtered_df.to_json()


@app.callback(Output('graph', 'figure'), [Input('cache', 'children')])
def update_graph(cached_data):
    filtered_df = pd.read_json(cached_data)
    return {
        'data': [{
            'x': filtered_df['x'],
            'y': filtered_df['y'],
            'type': 'bar'
        }]
    }


if __name__ == '__main__':
    app.run_server(debug=True)

Hey @chriddyp thanks for the swift answer. I was just looking into hidden Divs :smiley: (after wide searches on parallel computation, and trying to understand data management with uwsgi workers :-/ ). I’ll refactor the code to make the JSON serialized data as small as possible - might be good for the overall architecture of the app anyway.

As for the publicity of the app, that’s not up to me, but I try to push my collaborators this way :wink:

Oh while I’m at it, in one of my attemps, I sub classed dash.Dash to embark a container (with the image, the contours and other data). Is this a good idea, at least for the read only parts? It simplifies the code a bit.

1 Like