We have built a wrapper around AgGrid [https://www.ag-grid.com/]
We have huge data to load, ~ 400 k rows, 250 columns. AgGrid can load and render these without a hiccup.
The dataset is huge and we are okay with ~1.5 GB request reponses.
On the backend [server] data is loaded into a huge df, JSON seralization has become a pain point, which seems to be consuming the most of the time.
why all data at a time [requirement]
some stats:
files are split over 100 csv with size ranging from 30 to 50 MB - read time of all files ~13 seconds [multi-threaded]
concat of these into a huge df ~ 4 seconds
df to JSON seralization ~ 45 seconds [this is the problem]
Has anybody tried solving the problem ?
If not want to know what is the best design among below
Things/Design we have tied or looked into
- Dash-Extensions – ServerSideOutput [more of less caching], we just load everything on to the UI once, there is no to and fro between callbacks in our case.
- Looking at APIs on flask which AgGrid can interact with and split the payload over multiple request response cycles [easier to implement] rather than doing so on dash callback.
- starlette / flask with socket/SSE to do the same as 2 [one time activity] could come in handy at other places.
- Not all columns are necessary, we could dynamically load columns as necessary, again requires 2 or 3.