Hi Everyone, long time listener, first time caller. I love dash, and I’ve learned so much from the forums, but I’m encountering a problem that I just cannot make sense of.
I think Dash is inflating the data that I’m using in my callbacks, and I don’t understand why.
I have an app/page that has two datatables, an “underlying data” table (id:my-datatable, and an “overview” data table (id:summary-datatable).
I have a callback that takes in the “data” of my-datatable as well as the “derived-virtual-data” of my-datatable, converts that data into pandas data frames, and then does some manipulations (remove certain duplicates) and calculations (conditional count, conditional sum of specific columns) via pandas, and then outputs those results to summary-datatable
Running on my computer locally, everything is fine. But when I have it running through nginx, I got the following error: [“413 Request Entity Too Large”]
I looked at the size that it was trying to send - it was ~76MB! The data from the data table is only 6-10MB on any given day, so even if I was trying to send in both the data and virtual data, I would think it would only be 12-20 MB max.
I increased the max size for nginx, which allowed it to load. But now it’s loading super slowly, and I think the ~76MB transfers are the reason why.
Please help if you have any thoughts as to why the data transfer of ‘data’ and ‘derived_virtual_data’ would be so inflated?
Note:I excluded the layout divs from the code snippet to minimize the code you have to look at. Below should everything relevant to the problem. Let me know If you think there’s more I need to add.
#query runs and converts to dataframe on initial page load and browser refresh
def get_data():
sql = open('OAD_Display.sql','r').read()
df = pd.read_sql(sql, engine)
return df
df1 = get_data()
dash_table.DataTable(
id='my-datatable',
columns=
[{'name': ['Context',i], 'id': i, 'editable': False} for i in df1.columns[0:1]] +
[{'name': ['Context',i], 'id': i, 'editable': False, 'hideable': True} for i in df1.columns[1:12]] +
[{'name': ['Impact',i], 'id': i, 'editable': False} for i in df1.columns[12:15]] +
[{'name': ['Impact',i], 'id': i, 'editable': False, 'type': 'numeric','format': FormatTemplate.money(0)} for i in df1.columns[15:18]] +
[{'name': ['Resolution',i], 'id': i, 'editable': True} for i in df1.columns[18:19]] +
[{'name': ['Resolution',i], 'id': i, 'editable': True, 'presentation': 'dropdown'} for i in df1.columns[19:20]],
export_format= 'xlsx',
merge_duplicate_headers=True,
data= get_data().to_dict('records'))
dash_table.DataTable(
id='summary-datatable',
columns=
[{"name": ['Cost',"Metric"], "id": 'Cost-Metric', 'editable':False,}]+
[{"name": ['Cost',"Amount"], "id": 'Cost-Amount', 'editable':False,'type': 'numeric','format': FormatTemplate.money(0)}]+
[{"name": ['Cost',"Percent"], "id": 'Cost-Percent', 'editable':False,'type': 'numeric','format': FormatTemplate.percentage(1)}]+
[{"name": ['Stats',"Metric"], "id": 'Stats-Metric', 'editable':False,}]+
[{"name": ['Stats',"Quantity"], "id": 'Stats-Quantity', 'editable':False,}]+
[{"name": ['Stats',"Percent"], "id": 'Stats-Percent', 'editable':False,'type': 'numeric','format': FormatTemplate.percentage(1)}],
merge_duplicate_headers=True,
data= [{}],
export_format= 'xlsx'
) #summary table is blank on initial load
dcc.Store(id='query_store')# just used to make sure that virtual data pulled into callback is virtual data after filter query is entered by user, not before
@app.callback(
Output('query_store','data'),
[Input('my-datatable','filter_query')]
)
def update_query_store(filter_query):
return filter_query
@app.callback(
Output('summary-datatable', 'data'),
[Input('query_store', 'data')],
[State('my-datatable', 'derived_virtual_data'),
State('my-datatable', 'data')])
def update_summary(filter_query, derived_virtual_data,data):
if derived_virtual_data is None or derived_virtual_data == []:
return [{}]
else:
return_data =[]
df_user_view_data = pd.DataFrame.from_dict(derived_virtual_data)
df_original_data = pd.DataFrame.from_dict(data)
uv_unique_WOID_df = df_user_view_data.drop_duplicates(subset='WarpObjectID',keep='first')
og_unique_WOID_df = df_original_data.drop_duplicates(subset='WarpObjectID',keep='first')
uv_Total_Orphan_Cost = uv_unique_WOID_df['Orphan Cost'].sum(axis=0)
og_Total_Orphan_Cost = og_unique_WOID_df['Orphan Cost'].sum(axis=0)
uv_Unique_Orphan_Instances = uv_unique_WOID_df['WarpObjectID'].count()
og_Unique_Orphan_Instances = og_unique_WOID_df['WarpObjectID'].count()
return_data.append({'Cost-Metric': 'Total Actionable Orphan Cost', 'Cost-Amount': uv_Total_Orphan_Cost, 'Cost-Percent': (uv_Total_Orphan_Cost/ og_Total_Orphan_Cost), 'Stats-Metric': 'Unique Actionable Orphan Instances','Stats-Quantity': uv_Unique_Orphan_Instances, 'Stats-Percent': (uv_Unique_Orphan_Instances/og_Unique_Orphan_Instances)})
object_types = ['PO Line', 'Work Order', 'Inventory', 'Master Schedule', 'RFPO', 'Released Planned Order']
for ot in object_types:
uv_TOC_ot = uv_unique_WOID_df.loc[(uv_unique_WOID_df['Type'] == ot), 'Orphan Cost'].sum(axis=0)
uv_UOI_ot = uv_unique_WOID_df.loc[(uv_unique_WOID_df['Type'] == ot), 'WarpObjectID'].count()
return_data.append({'Cost-Metric': 'Total Actionable Orphan Cost - '+ot,'Cost-Amount': uv_TOC_ot,'Cost-Percent':(uv_TOC_ot/og_Total_Orphan_Cost) ,'Stats-Metric': 'Unique Actionable Orphan Instances - '+ot,'Stats-Quantity': uv_UOI_ot,'Stats-Percent':(uv_UOI_ot/og_Unique_Orphan_Instances)})
return return_data
The query_store is just a workaround to ensure that the derived_virtual_data is the data after the filter has been applied. Without it, the derived virtual data is always one filter query behind. I don’t think this has anything to do with it.
The update_summary function takes the virtual data and original data, creates a few additional dataframes and then performs the same operations to each data frame. Then I add it as a table row to the summary results.
I was thinking that the problem was these additional dataframes, but that doesn’t make sense to me - those data frames aren’t a part of the input or the output of the callback. The input is the original and virtual data (and the filter query string), and the output is a tiny table!
Thanks for your help in advance!