Handling a large csv file with dash datatable

hannnnnnn · February 8, 2023, 9:55am

Hi,
I’m using dash to let users sort/filter table and export selected rows/columns from a csv file which is 44mb in size and has 87k rows with 44 columns.
I’m currently using dask dataframe library to read the csv file as a dataframe, and displaying the dataframe using dash_table.DataTable. It’s faster than using pandas to load the table but it’s still quite slow when initially opening the app.

is there any way to handle big csv file with improved speed?
Thank you for your help in advance.

AIMPED · February 8, 2023, 9:59am

Hi @hannnnnnn welcome to the forums.

Maybe you could look into this:

nedned · February 8, 2023, 2:19pm

You could try converting your CSV into a parquet file first (eg using Pandas) then reading with Pandas using pd.read_parquet on your app startup. Parquet files contain the schema of the data within them, so unlike CSV files, Pandas (or whatever tool) doesn’t have to spend time inferring the data type of each column, which can be slow. Parquet is also columnar oriented which can improve the load time, especially if you know there’s a column you don’t need and you tell pandas to skip it.

Your mileage may vary though, as the size of the change in load time will vary with the size of your data and data types of the columns.

Topic		Replies	Views
Large file display in dash aggrid Dash Python	6	762	November 10, 2023
Dash AG and Big Data Dash Python question	5	1186	April 27, 2023
AG Grid vs Datatable? Dash Python question	11	5599	August 7, 2023
Exploring Large Datasets with Dash and Parquet Dash Python show-and-tell	2	1188	October 4, 2022
Speeding up the dash Dash Python	1	477	February 13, 2020

Handling a large csv file with dash datatable

Related topics