Parquet Files and Dash - Community Thread

chriddyp · June 24, 2018, 3:14am

Similar to Apache Arrow and Dash - Community Thread, I wanted to open up a thread to share strategies around using Parquet files in Dash.

I recently read this article: https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html (“Query Parquet files in SQLite”) and the results seem pretty compelling: great compression and query speed.

Has anyone else used Parquet files with their Dash apps? Please share your experiences!

jarias · June 25, 2018, 12:05am

Hi there,

we have been playing with dash to visualize our data lake with AWS Athena (pyathena+pandas) as a nice alternative to BI tools or jupyter notebooks. This “internal” project turned into a product, as getting quick interactive apps directly from the data lake is wonderful. However although Athena is awesome, it is slow if you want user experience (Apache presto is for long queries and has a considerable cold start).

For that we started querying the data lake directly with Apache Arrow, which can read data directly from S3 to a pandas DataFrame, and it is working pretty nice. The data stays in parquet format, schemas are preserved and latency is assumable.

Topic		Replies	Views
Apache Arrow and Dash - Community Thread Dash Python	8	9254	September 15, 2020
Exploring Large Datasets with Dash and Parquet Dash Python show-and-tell	2	1271	October 4, 2022
Community version in production Dash Python	1	56	January 16, 2025
Load the data using databricks sql connector in a optimized way Dash Python	2	200	January 18, 2024
Handling a large csv file with dash datatable Dash Python question	2	971	February 8, 2023

Parquet Files and Dash - Community Thread

Related topics