Handling Large Dataframe Dash Tables

I am using the dash-table-experiments (https://github.com/plotly/dash-table-experiments). I have a large dataframe (~150,000 rows) which is used to fill the rows of the DataTable. Unfortunately, when I try to use all the rows I get a memory error or other problems using the sortable/filterable features of the DataTable. The thing is I don’t need to display all 150,000 rows in the DataTable, it’s fine to have a more manageable number like 1,000 or 10,000 but I need to be able to search/filter the entire dataframe not just the 1,000 or 10,000 used as the rows value. It would be awesome to have an option in the DataTable like “max_rows_to_present” which would enable the filtering of the entire database, but only present <= “max_rows_to_present”. Is this possible already? Could this feature be implemented? Do you have any ideas about how I could implement this myself?

Dash and the DataTables feature are super super super cool! @chriddyp

Maybe you could try hooking up a separate dcc.Input and dcc.Dropdown to your DataTable and doing the filtering and sorting server side?
Alternatively, you might be able to listen to the filters property of the DataTable (https://github.com/plotly/dash-table-experiments/blob/master/src/components/DataTable.react.js#L418) and update the rows accordingly.

Thanks for getting back to me!

Your first suggestion was my initial instinct and I very well may end up doing that since I want datepicker inputs instead of filters.

I’m looking at https://github.com/plotly/dash-table-experiments/blob/master/src/components/DataTable.react.js#L146

I’m a react noob, but do you think it would be possible to change this._absolute.rows to some new parameter (yet to exist) which represents all the rows, perhaps something like this._absolute.all_rows? I’m going to clone and play around with this!

I have exactly the same requirement, did you have any success in getting this working?

I resorted to doing something like the following where the filtering/sorting is handled separately with the knowledge of the whole dataframe…

import dash
from flask import Flask
import dash_table_experiments as dt
from dash.dependencies import Input, Output
import dash_core_components as dcc
import dash_html_components as html
import pandas as pd
df = pd.DataFrame({'number':[i for i in range(100000)],'value':[str(i)+'_value' for i in range(100000)],'value1':[str(i)+'_value1' for i in range(100000)]})
app = dash.Dash(__name__)
app.scripts.config.serve_locally = True
app.layout = html.Div([html.H4('Fake Table'),
    html.Div([html.P('Search a number (up to 100,000):  ', style={'display':'inline-block'}), dcc.Input(type='text', value='', id='input1'),
    html.Button('SORT',id='input2')], style={'display':'inline-block'}),
    dt.DataTable(
        rows=df.head(100).to_dict('records'),
        columns=sorted(df.columns),
        filterable=False,
        sortable=True,
        selected_row_indices=[],
        id='datatable'
    )])
def filter(val, clicks):
    """
    For user selections, return the relevant in-memory data frame.
    """
    if clicks:
        if clicks % 2 == 1:
            df.sort_values('number', ascending=False, inplace=True)
        else:
            df.sort_values('number', ascending=True, inplace=True)
    return df.loc[df.number.astype(str).str.contains(val)]
@app.callback(Output('datatable','rows'),[Input('input1', 'value'), Input('input2', 'n_clicks')])
def update(val, clicks):
    df = filter(val, clicks)
    return df.head(100).to_dict('records')



if __name__ == "__main__":
    app.run_server(debug=True)

Fantastic, thanks for the example, really appreciate it!