Upload 500MB csv but content is empty

Hello, New member here! Since I using this app is about a mouth ago, everything is fine at that time, recently I updated it and want to check out my data(500MB) can still working or not, but only thing I got is:
ValueError: not enough values to unpack (expected 2, got 1)

I thought maybe is my code have some issue so I tried to run my data on my simple code, but result is still error:

import base64
import io
import pandas as pd
import plotly.express as px
from dash import html, dcc, Dash, Input, Output, State

app = Dash(__name__)

sidebar = html.Div([
    dcc.Upload(
        id='target_upload',
        children=html.Div([
            'Drag and Drop or ',
            html.A('Select Files')
        ]),
        style={
            'width': '100%',
            'height': '45px',
            'lineHeight': '45px',
            'borderWidth': '1px',
            'borderStyle': 'dashed',
            'borderRadius': '5px',
            'textAlign': 'center',
            'margin': '0px',
            'margin-right': '10px',
            'cursor': 'pointer'
        },
        accept=".csv",
        multiple=False
    )
])

app.layout = html.Div([sidebar, dcc.Graph(id='graph')])

@app.callback(
    Output('graph', 'figure'),
    [Input('target_upload', 'contents')],
    [State('target_upload', 'filename')]
)
def update_graph(contents, filename):
    print(contents)
    print(filename)
    if contents is None:
        return {}

    
    content_type, content_string = contents.split(',')
    decoded = base64.b64decode(content_string)
    
    if 'csv' in filename:
        # Assume that the user uploaded a CSV file
        data_set = pd.read_csv(io.StringIO(decoded.decode('utf-8')))
    print(data_set)
    return {}

if __name__ == '__main__':
    app.run_server(debug=False)

After I upload my data ‘hourfinal0910.csv’, output:

None
None
[2024-03-24 18:17:22,060] ERROR in app: Exception on /_dash-update-component [POST]
Traceback (most recent call last):
  File "C:\Users\bruhq\anaconda3\Lib\site-packages\flask\app.py", line 2529, in wsgi_app
    response = self.full_dispatch_request()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\bruhq\anaconda3\Lib\site-packages\flask\app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\bruhq\anaconda3\Lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\bruhq\anaconda3\Lib\site-packages\flask\app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\bruhq\anaconda3\Lib\site-packages\dash\dash.py", line 1283, in dispatch
    ctx.run(
  File "C:\Users\bruhq\anaconda3\Lib\site-packages\dash\_callback.py", line 447, in add_context
    output_value = func(*func_args, **func_kwargs)  # %% callback invoked %%
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\bruhq\AppData\Local\Temp\ipykernel_9236\3795544345.py", line 46, in update_graph
    content_type, content_string = contents.split(',')
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 2, got 1)
127.0.0.1 - - [24/Mar/2024 18:17:22] "POST /_dash-update-component HTTP/1.1" 500 -

hourfinal0910.csv

And here is my hourfinal0910.csv info:

	deviceId	PM2.5	Date	lat	lon	cat.col	gas	adj_gas	people_total	population_denity	...	adj_industrial_count	adj_industrial_sum	metro	adj_metro	husbandry_count	adj_husbandry_count	Concrete_count	adj_Concrete_count	temple_count	adj_temple_count
0	11284763524	4.00	2023-09-01	23.904930	120.539231	彰化縣永靖鄉	327.0	193.000000	35879	1767.956772	...	0.200000	1.498000	0.0	0.000000	0.0	0.400000	1.0	0.200000	32.0	23.600000
1	9091113674	15.00	2023-09-01	23.962860	120.532280	彰化縣埔心鄉	392.0	281.600000	34037	1622.106497	...	0.000000	0.000000	0.0	0.000000	0.0	0.000000	0.0	1.000000	30.0	22.400000
2	9101369089	14.00	2023-09-01	24.049589	120.438044	彰化縣鹿港鎮	2868.0	9413.200000	85837	1034.314806	...	0.600000	8.600000	0.0	0.000000	0.0	0.200000	2.0	1.800000	73.0	36.600000
3	6821262718	4.70	2023-09-01	24.900941	121.204428	桃園市平鎮區	27787.0	28762.000000	228203	5251.477375	...	1.666667	180.373333	0.0	0.166667	0.0	0.333333	2.0	3.166667	16.0	26.166667
4	9099845309	7.00	2023-09-01	23.936440	120.430861	彰化縣二林鎮	512.0	152.750000	49082	532.360046	...	0.375000	27.875000	0.0	0.000000	0.0	0.375000	0.0	1.000000	46.0	28.250000
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2396563	12655828555	19.00	2023-10-31 20:00:00	23.619530	120.312400	雲林縣元長鄉	0.0	39.000000	24177	335.014373	...	0.333333	33.555556	0.0	0.000000	3.0	1.111111	0.0	0.888889	34.0	31.333333
2396564	11779094501	6.97	2023-10-31 20:00:00	24.794507	120.934573	新竹市香山區	20252.0	20512.000000	78641	1210.155300	...	0.600000	43.860000	0.0	0.000000	0.0	0.000000	2.0	2.000000	44.0	31.200000
2396565	9028518867	8.99	2023-10-31 20:00:00	24.991967	121.250616	桃園市中壢區	74085.0	30263.300000	422529	5552.275360	...	1.700000	242.404100	1.0	0.300000	0.0	0.500000	2.0	3.100000	41.0	19.800000
2396566	11324221843	11.64	2023-10-31 20:00:00	22.992742	120.187078	臺南市中西區	8089.0	13977.833333	77704	12193.528461	...	1.000000	168.065000	0.0	0.000000	0.0	0.000000	0.0	0.666667	94.0	60.833333
2396567	11986973389	5.44	2023-10-31 20:00:00	24.375212	120.680164	臺中市大甲區	7960.0	4014.800000	75516	1272.397567	...	0.200000	43.694000	0.0	0.000000	0.0	0.600000	0.0	1.200000	20.0	29.200000

2396568 rows × 32 columns

Maybe interesting:

1 Like

Have you tried your app with a smaller csv file? If that works you can try to increase the size of the file you try to upload, e.g. you make a copy of hourfinal0910.csv and delete quarter/half/three quarters of the dataset and see when the upload stops working.

P.S. Can you please format your code as a code block, before your code, add ``` and when the block is finished, close with the three backticks.

1 Like

I try it like abouts 350MB is working normally, but error occur at around 420MB, also I even try on different device (other computer) is remain the same:(

thanks for p.s. though! I also add some of my hourfinal0910.csv info in the topic, hoping can solve this soon.

I am not sure about data limits in the dcc.Upload component. My limited knowledge tells me that it should be limited only by the memory that your pc has available. Did you try different browsers?

Also, the error output is bothering me. The first two lines in the output log clearly show that contents and filename are None. If that is the case, the if statement should exit the callback, right? How come the code still advances to contents.split(",")?

Btw, do you need the app to work in the cloud, or is it locally run only? If it’s local only, you could perhaps skip the upload completely. If you let the user pass the Path to the file (for example via a simple input field), you can directly open it in the callback. The callback could then look like:

@app.callback(
    Output('graph', 'figure'),
    Input('input_file_path', 'value'),
)
def update_graph(file_path: str):
    if not file_path:
        return {}
    
    # add this to your import list: from pathlib import Path
    if Path(filename).suffix == ".csv":
        data_set = pd.read_csv(file_path)
    else:
        print(f"[ERROR] Wrong file uploaded, only csv files are allowed: {filename}")
        return {}
    print(data_set)
    return {}
1 Like

Oh, another thought came up. The dcc.Upload component converts the uploaded data to a base64 string. This means that the size of the uploaded data, and the string that dcc.Upload stores in the browser memory is not the same. What you could try with your original code, for the csv files you were able to upload, is to check the length of contents, i.e. len(contents). This will count the number of characters in the string, then you can say each character = 1Byte, or each 1_048_576 characters is 1 MB. This gives you an approximate of how much memory your uploaded data consumes.

I have seen that the size inflation can sometimes be very meaningful.