Download large csv file

I am building an app with Dash plotly, and the app has the option to save a csv file (that create during the running time of the app), but I have problem with saving the csv file.

What I was trying to do is to use html.A component, and set the href attribute with this data:

csv_string = df.to_csv(encoding='utf-8', index=True)
csv_string = "data:text/csv;charset=utf-8," + urllib.parse.quote(csv_string)

the csv_string is the data I assigned to href attribute. I saw someone recommended on this approach, and it’s really seems to work.

The problem appeared when the Data frame is too big. When this happened there is a download error when trying to save the file.

  1. Do you think I classified the problem correctly? Is it really possible to be a size problem?
  2. What do you think I can do to fix this problem? is there any other solution for saving the file? I need to say that I don’t want to download the file to static folder. I need to have solution that download the file to the default download folder of the user or enable the user to have the option to select the folder he would like to save the file into (with windows that pop up maybe).

I found this link: http://chandrewz.github.io/blog/downloading-large-csv-files-via-href that describe exactly the problem I have. Is there a solution similar to what the author suggest in Python?

For use in Dash, you can just create a client side callback that uses a blob. Here is an example using dash-extensions,

import io
import dash
import dash_html_components as html
import numpy as np
import pandas as pd

from dash.dependencies import Output, Input
from dash_extensions.snippets import download_store

# Create app.
app = dash.Dash(prevent_initial_callbacks=True)
app.layout = html.Div([html.Button("Download csv", id="btn")] + download_store(app, id="to_download"))


@app.callback(Output("to_download", "data"), [Input("btn", "n_clicks")])
def generate_csv(n_nlicks):
    # Generate some data.
    data = np.column_stack((np.arange(10), np.arange(10) * 2))
    df = pd.DataFrame(columns=["a column", "another column"], data=data)
    # Convert data to a string.
    s = io.StringIO()
    df.to_csv(s, index=False)
    data_string = s.getvalue()
    # The output must follow this form for the download to work.
    return dict(filename="some_name.csv", data=data_string, type="text/csv")


if __name__ == '__main__':
    app.run_server(port=5001)

which you can install from pip,

pip install dash-extensions == 0.0.8

@Emil Is there a way to use this without the flag prevent_initial_callbacks?

Yes, that is not important. I guess you can just omnit it.

@Emil Amazing! Thank you.
Do you know how can I use this solution to save images and zip files?
Right now I am saving images with this code:

out_img = BytesIO()
in_fig.savefig(out_img, format=format, dpi=dpi, bbox_inches=bbox_inches)
out_img.seek(0)  # rewind file
encoded = base64.b64encode(out_img.read()).decode("ascii").replace("\n", "")
return "data:image/png;base64,{}".format(encoded) 

The returned value I assigned to href attribute

I decided to wrap the code in a component for convenience. Now (version 0.0.10), you can simply use the Download component. Hence the previous example now look like this,

import io
import dash
import dash_html_components as html
import numpy as np
import pandas as pd

from dash.dependencies import Output, Input
from dash_extensions import Download

# Generate some example data.
data = np.column_stack((np.arange(10), np.arange(10) * 2))
df = pd.DataFrame(columns=["a column", "another column"], data=data)
# Create app.
app = dash.Dash(prevent_initial_callbacks=True)
app.layout = html.Div([html.Button("Download csv", id="btn"), Download(id="download")])


@app.callback(Output("download", "data"), [Input("btn", "n_clicks")])
def generate_csv(n_nlicks):
    # Convert data to a string.
    s = io.StringIO()
    df.to_csv(s, index=False)
    content = s.getvalue()
    # The output must follow this form for the download to work.
    return dict(filename="some_name.csv", content=content, type="text/csv")


if __name__ == '__main__':
    app.run_server()

Could you provide a mwe of the figure/zip issue in question, i.e. include e.g. figure data?

@Emil in_fig is a matplotlib figure

I have just (version 0.0.11) updated the Download component to handle images as byte arrays. Here is a small example using matplotlib,

import io
import dash
import dash_html_components as html
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
from dash.dependencies import Output, Input
from dash_extensions import Download

# https://stackoverflow.com/questions/49921721/runtimeerror-main-thread-is-not-in-main-loop-with-matplotlib-and-flask
matplotlib.use('Agg')
# Generate some example figure.
x = np.arange(10)
fig = plt.figure()
plt.plot(x, x ** 2)
# Create app.
app = dash.Dash(prevent_initial_callbacks=True)
app.layout = html.Div([html.Button("Download csv", id="btn"), Download(id="download")])


@app.callback(Output("download", "data"), [Input("btn", "n_clicks")])
def generate_csv(n_nlicks):
    # Convert image to a byte array.
    out_img = io.BytesIO()
    fig.savefig(out_img, format="png", dpi=300)
    content = list(out_img.getvalue())
    # The output must follow this form for the download to work.
    return dict(filename="figure_name.png", content=content, type="image/png")


if __name__ == '__main__':
    app.run_server()
1 Like

Hello @Emil, I liked your Download extension :slight_smile:

While using it, I couldn’t find a way to skip returning the “data” input.
Example:

@app.callback(Output("download", "data"), [Input("btn", "n_clicks")])
def generate_csv(n_clicks):
    if n_clicks:
        ...
        return dict(filename="some_name.csv", content=content, type="text/csv")
    else:
        return dict(filename="", content="", type="")

This will make my app to download an empty .txt file whenever I start a session.
On the other hand, returning nothing throws an error.
What is the solution?

What is your version of Dash?

app = dash.Dash(prevent_initial_callbacks=True)

Does not work for me.

Cheers!

Thanks!

The “prevent_initial_callbacks” is introduced in Dash 1.12.0, so your’ll need that version for my example to work. If you wish to about a callback, you can use the raise the PreventUpdate exception (or return dash.no_update for elements that are not to be updated).

1 Like

Thanks Emil for this example. Would it be possible to also provide an example that shows how to export the data from a dash datatable?

Here is what I am trying to do:

  • When a user clicks on an export button, it launches a modal asking the user to enter his email
  • When the user clicks on a submit button in the modal and the email exists in the input form, then it downloads the data from the dash datable as csv

What would be the best way to achieve the above? I thought about using the native export button of the dash datatable but there is no id associated to it.

Thanks for your help.

The simplest approach would probably be to convert the data table into a pandas data frame. You can then use the pandas write to create file for you in the desired format. Here is a small example,

import dash
import dash_table
import pandas as pd
import dash_html_components as html

from dash.dependencies import Output, Input, State
from dash_extensions import Download
from dash_extensions.snippets import send_data_frame

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/solar.csv')
# Create example app.
app = dash.Dash(__name__, prevent_initial_callbacks=True)
app.layout = html.Div([dash_table.DataTable(
    id='table', columns=[{"name": i, "id": i} for i in df.columns],  data=df.to_dict('records'),
), html.Button("Download", id="btn"), Download(id="download")])


@app.callback(Output("download", "data"), [Input('btn', 'n_clicks')], [State('table', 'data')])
def func(n_clicks, data):
    my_df = pd.DataFrame.from_records(data)
    return send_data_frame(my_df.to_csv, "my_df.csv", index=False)


if __name__ == '__main__':
    app.run_server()

What you are trying to achieve should be possible. The structure could be something like,

  • One callback with export button as Input, which targets the modal as Output (and thus triggers that the model is shown)
  • Another callback with the submit button and the input form as Input and data table as State, which targets the download component as Output (and thus triggers the download)
1 Like

Thank you very much Emil for your prompt reply. I will give it a try.

Can this be adapted to download other types of files such as ZIP files?

I tried using dcc.send_file with a dcc.Download component, but it gives are “connection closed” error for large files. Smaller files work perfectly!

Any suggestions would be much appreciated.

You might try using a background callback. Maybe you are running into server timeouts.

Thanks! Already tried using background callbacks. No affect on the outcome - still “connection closed” after a few seconds…

Could you provide a MRE?

app.layout = html.Div(
    [html.Button("Download Zipped pptx", id="dl-btn"), dcc.Download(id="download-pptx")]
    )
@callback(
    Output('download-pptx', 'data'),
    Input('dl-btn', 'n_clicks'),
    background=True,
    prevent_initial_call=True
    )
def download_pptx(n_clicks):
    op_filename = 'largefile.zip'     
    return dcc.send_file(op_filename)