How to reproduce 'content' Output of dcc.Upload?

enq · February 25, 2021, 9:49am

I am not able to reproduce the exact output (‘content’) of the dcc.Upload component.

If I upload the file my_excel.xlsx to the dcc.Upload component, my callback receives a “base64 encoded string” (according to the dcc.Upload documentation). I don’t know how to reproduce the exact same string without the dcc.Upload component.

my current approach:

with open('tests/data/my_excel.xlsx', 'rb') as file:
    raw_data = file.read()
	
_, content_string = raw_data.split(',') # this Fails

I get the error TypeError: a bytes-like object is required, not 'str'

if I add

raw_data = base64.b64encode(raw_data)

before the split, I get the same error.

How do I get the exact same “base64 encoded string” without the dcc.Upload Component?

Thanks very much in advance

enq · March 2, 2021, 7:53am

Solution:

import base64
import io
import pandas as pd
import magic

filepath = 'tests/data/my_excel.xlsx'

# Reproduce output of dcc.Upload Component
with open(filepath, "rb") as file:
    decoded = file.read()
content_bytes = base64.b64encode(decoded)
content_string = content_bytes.decode("utf-8")

mime = magic.Magic(mime=True)
mime_type = mime.from_file(filepath)
content_type = "".join(["data:", mime_type, ";base64"])

contents = "".join([content_type, ",", content_string])

# and now revert: convert contents to binary file stream
content_type, content_string = contents.split(",")
decoded = base64.b64decode(content_string)
df = pd.read_excel(io.BytesIO(decoded))

(based on this SO reply)

drakebohan · December 23, 2021, 6:38am

The reason for this error is that in Python 3, strings are Unicode, but when transmitting on the network, the data needs to be bytes instead. We can convert bytes to string using bytes class decode() instance method, So you need to decode the bytes object to produce a string. In Python 3 , the default encoding is “utf-8” , so you can use directly:

b"python byte to string".decode("utf-8")

Python makes a clear distinction between bytes and strings . Bytes objects contain raw data — a sequence of octets — whereas strings are Unicode sequences . Conversion between these two types is explicit: you encode a string to get bytes, specifying an encoding (which defaults to UTF-8); and you decode bytes to get a string. Clients of these functions should be aware that such conversions may fail, and should consider how failures are handled.

Topic		Replies	Views
dcc.Upload in Render-deployed app Dash Python question	1	193	January 25, 2023
dcc.Upload - parse contents with multiple worksheets - Part 2 Dash Python	7	1527	July 15, 2022
Why dcc.Upload return empty contents, returned file name is correct Dash Python question	1	33	August 28, 2024
Dcc.upload component doesn't populate a datatable and fails silently Dash Python	2	422	October 20, 2021
How to convert file uploaded in dcc.Upload to integer to use in calculations? Dash Python	16	997	October 15, 2021

How to reproduce 'content' Output of dcc.Upload?

Related topics