I’m using dash to let users sort/filter table and export selected rows/columns from a csv file which is 44mb in size and has 87k rows with 44 columns.
I’m currently using dask dataframe library to read the csv file as a dataframe, and displaying the dataframe using dash_table.DataTable. It’s faster than using pandas to load the table but it’s still quite slow when initially opening the app.
is there any way to handle big csv file with improved speed?
Thank you for your help in advance.
You could try converting your CSV into a parquet file first (eg using Pandas) then reading with Pandas using pd.read_parquet on your app startup. Parquet files contain the schema of the data within them, so unlike CSV files, Pandas (or whatever tool) doesn’t have to spend time inferring the data type of each column, which can be slow. Parquet is also columnar oriented which can improve the load time, especially if you know there’s a column you don’t need and you tell pandas to skip it.
Your mileage may vary though, as the size of the change in load time will vary with the size of your data and data types of the columns.