identical output with significantly better scalability and performance on large file reads from dask.read_csv()
pandas.read_csv() executes without error while dask.read_csv() fails
requests.exceptions.HTTPError: 405 Client Error: Method Not Allowed for url: ht tps://plot.ly/ ~bdun9/ 2754.csv
I’ve been experimenting with Plotly and Dash using the following repository as a sandbox: https: //github .com/plotly/dash-vanguard-report.
As I was looking to test the performance difference between using Pandas and Dask, I ran into something that appears to be either an issue or an intentional security measure. When reading from a csv using Dask.read_csv(url), Dask performs a HEAD request on the Plotly url specified above. This is required to determine the content-length value so dask can parallelize its execution of Pandas.read_csv(url) for significant performance improvement when reading large files.
The error message was the following: “requests.exceptions.HTTPError: 405 Client Error: Method Not Allowed for url:…”
Is there a reason that HEAD requests are not allowed by the server on this csv url?
Any ideas for how I could resolve or work around this. If the performance improvement from using Dask is as good as the hype, it could IMO have great utility for the Plotly and Dash.