Issue with upload component and .txt file

faust · October 16, 2020, 9:33pm

Hello,

I’ve gone through the docs for using dcc.upload and have been able to reproduce the example. However, when I try uploading my .txt file, I get text that just says “There was an error processing this file.” but there is no traceback to go along with it since that’s part of the code. I believe this may have to do with the encoding piece and was wondering if there is a way to upload without this piece. I’m normally used to accessing this data in Jupyter with Pandas, but it doesn’t seem to translate into Dash the way I usually do it. Any help would be greatly appreciated.

My code:

app.layout = html.Div([
    html.B('Please upload patient SNP data below.  The file format should be .txt'),
    html.Br(),
    dcc.Upload(
        id='upload-data',
        children=html.Div([
            'Drag and Drop or ',
            html.A('Select Files')
        ]),
        style={
            'width': '100%',
            'height': '60px',
            'lineHeight': '60px',
            'borderWidth': '1px',
            'borderStyle': 'dashed',
            'borderRadius': '5px',
            'textAlign': 'center',
            'margin': '10px'
        },
        # Allow multiple files to be uploaded
        multiple=True
    ),
    html.Div(id='output-data-upload'),
])


def parse_contents(contents, filename):
    content_type, content_string = contents.split(',')

    decoded = base64.b64decode(content_string)
    try:
        if 'txt' in filename:
            # Assume that the user uploaded a CSV file
            df = pd.read_csv(
                io.StringIO(decoded.decode('utf-8'), delimiter = r'\s+'))
                            #dtype={'rsid':'str', 'chromosome':'object', 'position':'int', 'genotype':'str'}, comment='#'))
        elif 'xls' in filename:
            # Assume that the user uploaded an excel file
            df = pd.read_excel(io.BytesIO(decoded))
    except Exception as e:
        print(e)
        return html.Div([
            'There was an error processing this file.'
        ])

    return html.Div([
        html.H5(filename),
        #html.H6(datetime.datetime.fromtimestamp(date)),

        dash_table.DataTable(
            data=df.to_dict('records'),
            columns=[{'name': i, 'id': i} for i in df.columns]
        ),
        html.Div(
            [dcc.Markdown('''**There are {:.0f} matching SNPS**'''.format(len(df)))]
        ),

        html.Hr(),  # horizontal line

        # For debugging, display the raw contents provided by the web browser
        html.Div('Raw Content'),
        html.Pre(contents[0:200] + '...', style={
            'whiteSpace': 'pre-wrap',
            'wordBreak': 'break-all'
        })
    ])


@app.callback(Output('output-data-upload', 'children'),
              [Input('upload-data', 'contents')],
              [State('upload-data', 'filename')])
def update_output(list_of_contents, list_of_names):
    if list_of_contents is not None:
        children = [
            parse_contents(c, n) for c, n in
            zip(list_of_contents, list_of_names)]
        return children

How I normally convert the .txt file when using pandas:

data = pd.read_csv('my_genome.txt', sep='\t', dtype={'rsid':'str', 'chromosome':'object', 'position':'int', 'genotype':'str'}, comment='#')

the .txt file looks like this:

'# This data file generated by 23andMe at: Mon Mar 23 11:18:23 2020
'#
'# This file contains raw genotype data, including data that is not used in 23andMe reports.
'# This data has undergone a general quality review however only a subset of markers have been
'# individually validated for accuracy. As such, this data is suitable only for research,
'# educational, and informational use and not for medical or other use.
'#
'# Below is a text version of your data. Fields are TAB-separated
'# Each line corresponds to a single SNP. For each SNP, we provide its identifier
'# (an rsid or an internal id), its location on the reference human genome, and the
'# genotype call oriented with respect to the plus strand on the human reference sequence.
'# We are using reference human assembly build 37 (also known as Annotation Release 104).
'# Note that it is possible that data downloaded at different times may be different due to ongoing
'# improvements in our ability to call genotypes. More information about these changes can be >found at:
'# https://you.23andme.com/p/dfad0fe3300c6e2e/tools/data/download/
'#
'# More information on reference human assembly builds:
'# GRCh37 - hg19 - Genome - Assembly - NCBI
'#
rsid chromosome position genotype
rs548049170 1 69869 TT
rs13328684 1 74792 –
rs9283150 1 565508 AA
i713426 1 726912 AA
rs116587930 1 727841 GG
rs3131972 1 752721 GG

faust · October 16, 2020, 10:32pm

Solved! I just had to get rid of the “if”, “elif”, “except” and then I was able to get a traceback which allowed me to see the issue.

Topic		Replies	Views
Using Dash upload component to upload txt file and generating plots Dash Python	1	2240	January 12, 2021
Dash Upload - Uploading a Text File Dash Python	2	5158	March 6, 2018
dcc.Upload in Render-deployed app Dash Python question	1	193	January 25, 2023
Dash Upload Component - Decoding Large Files Dash Python	29	15956	October 31, 2020
dcc.Upload example not working Dash Python	1	1887	May 20, 2021

Issue with upload component and .txt file

Related topics