I am using dash datatable in webpage that is updated from a dropdown callback and the performance is very slow. The dataset I have is about 50mb with over 65,000 rows and 20 columns.
I have noticed the bottle_neck is converting the pandas dataframe to a dictionary to be injected by the datatable interface on every filter iteration of my data (performing server side filtering). It took like 15-20 seconds to convert the data to a dictionary.
So what I have done is converted the dataset to a dictionary once, kept a reference to the dictionary and then filter the list of dictionaries on every callback, the performance is much faster, but now I notice it appears the bottleneck is transferring the entire dataset to the client after every filter (about 8 seconds locally).
Is there a way to stream the data to the client, or only load a subset and then when scrolling through the dataset, if the user gets outside of the dataset window, a callback on scrolling can be triggered to load more rows?
I want to avoid using pagination, I just want the user the ability to scroll the entire dataset, but in order to increase performance, I think I need to load only the dataset in demand.
I am thinking something similar to this exampe: https://datatables.net/extensions/scroller/examples/initialisation/server-side_processing.html
I do have virtualization enabled.
As a test to the above, I modified the return of the callback to just return the first 2000 rows of the full list just to reduce the data sent to the client (all filtering is done on the full set of items), and the performance is amazing, which leads me to believe if I can figure out how to stream the data, it will solve the issue.
Also if there are any other recommendations on using larger datasets with data tables, I am open to all suggestions, or if I should change my approach completely, please let me know.