Similar to Apache Arrow and Dash - Community Thread, I wanted to open up a thread to share strategies around using Parquet files in Dash.
I recently read this article: https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html (“Query Parquet files in SQLite”) and the results seem pretty compelling: great compression and query speed.
Has anyone else used Parquet files with their Dash apps? Please share your experiences!
we have been playing with dash to visualize our data lake with AWS Athena (pyathena+pandas) as a nice alternative to BI tools or jupyter notebooks. This “internal” project turned into a product, as getting quick interactive apps directly from the data lake is wonderful. However although Athena is awesome, it is slow if you want user experience (Apache presto is for long queries and has a considerable cold start).
For that we started querying the data lake directly with Apache Arrow, which can read data directly from S3 to a pandas DataFrame, and it is working pretty nice. The data stays in parquet format, schemas are preserved and latency is assumable.