- Parquet is usually faster than any other data format (Feather is close).
- PyArrow Table doesn’t require any serialization from Apache Arrow, but you pay I/O cost to transfer from shared memory to the requesting process
- The clientside JSON store pattern only works if the data is small, otherwise you’re sending the data that is stored in the browser back and forth multiple times. So anything that stays on the server is likely going to be faster than clientside JSON
- Anything using Arrow is usually faster than static files, because all serialization happens in-memory. However, if the data is larger-than-memory, Arrow doesn’t help much as the data cannot be stored in memory. Parquet alone is likely the best bet in combination with Dask or Vaex. SQLite may be good depending on the operations needed
brain-plasma (mentioned above) provides a simple API for fast-serialization, smaller-than-memory values. Dask and Vaex do a good job otherwise. Just remember that Arrow needs to be held in memory to be most useful. If speed and out-of-core processing are most important, perhaps consider Julia and JuliaDB which has the fastest load time and out-of-core processing I’ve seen anywhere. It’s a tough problem to solve no matter how you swing it.