dcc.Upload PDF file

AlexisRangelC · May 30, 2022, 6:41pm

Hi there! I’m strugglin with the follow situation:

I have a dcc.Upload that pretend to receive a PDF file in order to apply a parse process with the camelot python module to extract the tables in PDF file an then, convert these in a pandas dataframes to create graphs…

I get this fuction that figures into the dash layout:


def GetInput3():

    return html.Div([

        dcc.Upload(

                    id='upload-data3',

                    children=html.Div(id='drag_drop', children=[

                        'Arrastra y suelta o ',

                        html.A(

                            'selecciona el buró de crédito en formato PDF')

                    ], style={'color': 'white'}),

                    multiple=True

                    ), ])

The result of drag and drop is pretending to obtain some table extracted by a the camelot module, so i defined a function that try to get the main table and transform it to a dataframe (i guess here is the problem, mainly i the way to decode the pdf, this is a process that works fro my in other process wich incluides a excel or csv file, not pdf file)

def parse_contents3(contents, filename, date):
    content_type, content_string = contents.split(',')

    decoded = base64.b64decode(content_string)
    try:
        if 'pdf' in filename:
            # Assume that the user uploaded a PDF file
            try:
                tables = camelot.read_pdf(io.StringIO(
                    decoded.decode('utf-8')), pages='all')
            except Exception as e:
                print(e)
                return html.Div([
                    'There was an error processing this file.'])
        else:
            return html.Div([
                'Try to set a PDF file.'
            ])

    except Exception as e:
        print(e)
        return html.Div([
            'There was an error processing this file.'
        ])

The last function is called by the callback that store the main df, which goint to be the root of a several graphs in the layout

@app.callback(Output('c-store3', 'data'),
              [Input('upload-data3', 'contents')],
              [State('upload-data3', 'filename'),
               State('upload-data3', 'last_modified')])
def update_output(list_of_contents, list_of_names, list_of_dates):
    if list_of_contents is not None:
        children = [
            parse_contents3(c, n, d) for c, n, d in
            zip(list_of_contents, list_of_names, list_of_dates)]

        return children

Where do you find the issue?
Thanks in advance… i repeat that this process works for my processing a csv or xlsx file, but didnt’t work with a pdf file.

vnavdulov · February 2, 2023, 2:31pm

AnnMarieW · February 2, 2023, 2:46pm

Hi @vnavdulov

This is a great article! Lots of useful examples, including how to get metadata from a PDF. Nice summary at the end too:

Conclusion

Today we looked at one direction a data scientist can take to upload a PDF to a Plotly dashboard and display certain content to a user. This can be super useful for a person who has to pull specific information from PDFs (ie. maybe customer email, name, and phone number) and needs to analyze that information. While this dashboard is basic, it can be tweaked for different styles of PDFs. For example, maybe you want to create a parsing dashboard that displays the citations in a research paper PDF you are reading so you can either save or use those citations to find other papers. Try to this code out today and if you add any cool functions, let me know!

Topic		Replies	Views
Issue with upload component and .txt file Dash Python	1	432	October 16, 2020
dcc.Upload in Render-deployed app Dash Python question	1	194	January 25, 2023
Best way to handle user upload PDF files Dash Python	4	3146	May 15, 2020
dcc.Upload - return a df and use it throughout your dashboard Dash Python	2	700	November 25, 2020
Using Dash upload component to upload txt file and generating plots Dash Python	1	2247	January 12, 2021

dcc.Upload PDF file

Conclusion

Related topics