CSV Upload to DataTable from Example Code: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xab in position 1: invalid start byte

Hi,

I’m having a hard time understanding why I get this error and peculiar behavior.
Problem: Uploading a csv to update DataTable gives the error:

[2020-03-02 20:19:18,271] ERROR in app: Exception on /_dash-update-component [POST]
Traceback (most recent call last):
File “C:\Users\85022657\AppData\Local\Continuum\anaconda3\lib\site-packages\flask\app.py”, line 2446, in wsgi_app
response = self.full_dispatch_request()
File “C:\Users\85022657\AppData\Local\Continuum\anaconda3\lib\site-packages\flask\app.py”, line 1951, in full_dispatch_request
rv = self.handle_user_exception(e)
File “C:\Users\85022657\AppData\Local\Continuum\anaconda3\lib\site-packages\flask\app.py”, line 1820, in handle_user_exception
reraise(exc_type, exc_value, tb)
File “C:\Users\85022657\AppData\Local\Continuum\anaconda3\lib\site-packages\flask_compat.py”, line 39, in reraise
raise value
File “C:\Users\85022657\AppData\Local\Continuum\anaconda3\lib\site-packages\flask\app.py”, line 1949, in full_dispatch_request
rv = self.dispatch_request()
File “C:\Users\85022657\AppData\Local\Continuum\anaconda3\lib\site-packages\flask\app.py”, line 1935, in dispatch_request
return self.view_functionsrule.endpoint
File “C:\Users\85022657\AppData\Local\Continuum\anaconda3\lib\site-packages\dash\dash.py”, line 1459, in dispatch
response.set_data(self.callback_map[output]“callback”)
File “C:\Users\85022657\AppData\Local\Continuum\anaconda3\lib\site-packages\dash\dash.py”, line 1339, in add_context
output_value = func(*args, **kwargs) # %% callback invoked %%
File “C:\Users\85022657\OneDrive - BAT\Projects\Australia - Pricing Simulator\Code\Deployment\Deliverables\pipeline\le1.py”, line 38, in update_output
return df_decoded.to_dict(“records”)
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xab in position 1: invalid start byte
127.0.0.1 - - [02/Mar/2020 20:19:18] “POST /_dash-update-component HTTP/1.1” 500

Here is some reproducible code.
Steps: Exporting the datatable, changing some values (or not), then uploading it.

import dash
import dash_core_components as dcc
import dash_html_components as html
import dash_table
from dash.dependencies import Input, Output, State
import base64
import io
import pandas as pd

df = pd.read_csv(‘https://raw.githubusercontent.com/plotly/datasets/master/solar.csv’)

app = dash.Dash(name)

app.layout = html.Div(
[
dash_table.DataTable(
id=‘table’,
columns = [{“name”: i, “id”: i} for i in df.columns],
filter_action = “native”,
data = df.to_dict(“records”),
export_format = “csv”
),

    dcc.Upload(
    id='upload-data',
    children=html.Div([
        'Drag and Drop or ',
        html.A('Select Files')
    ]),        
    )     
])

@app.callback(Output(‘table’, ‘data’),
[Input(‘upload-data’, ‘contents’)])
def update_output(contents):
decoded = base64.b64decode(contents)
df_decoded = pd.read_csv(io.StringIO(decoded.decode(‘utf-8’)))
return df_decoded.to_dict(“records”)

if name == ‘main’:
app.run_server(debug=False)

Many thanks!

Hi @adrian.s, I found a similar topic here on Stackoverflow https://stackoverflow.com/questions/55782460/unicodedecodeerror-while-starting-the-app-under-windows-with-pycharm

Looks like there may be a native character that cannot be converted to utf=8.
I hope that link helps!

Brandon

It seems to be reproducible with any dataset. At least on my end. If you use unicode_escape it skips loading the first column. It doesn’t matter which column is first. There seems to be a underlying problem with the beginning of the file. Or so I believe.

Thanks for answering!

The problem was with the .split(",") part. I’m not sure how exactly I messed it up, but that was the culprit.
The upload comes in byte format. Something like this:

data:application/vnd.ms-excel;base64,77u/U3RhdGUsTnVtYmVyIG9mIFNvbGFyIFBsYW50cyxJbnN0YWxsZWQgQ2FwYWNpdHkgKE1XKSxBdmVyYWdlIE1XIFBlciBQbGFudCxHZW5lcmF0aW9uIChHV2gpDQpDYWxpZm9ybmlhLDI4OSw0Mzk1LDE1LjMsMQ0KQXJpem9uYSw0OCwxMDc4LDIyLjUsMQ0KTmV2YWRhLDExLDIzOCwyMS42LDENCk5ldyBNZXhpY28sMzMsMjYxLDcuOSwxDQpDb2xvcmFkbywyMCwxMTgsNS45LDENClRleGFzLDEyLDE4NywxNS42LDENCk5vcnRoIENhcm9saW5hLDE0OCw2NjksNC41LDENCk5ldyBZb3JrLDEzLDUzLDQuMSwxDQo=

Split removes the first bit of the chunk (before the comma). It works like a charm afterwards.