Black Lives Matter. Please consider donating to Black Girls Code today.

Issue with upload component and .txt file

Hello,

I’ve gone through the docs for using dcc.upload and have been able to reproduce the example. However, when I try uploading my .txt file, I get text that just says “There was an error processing this file.” but there is no traceback to go along with it since that’s part of the code. I believe this may have to do with the encoding piece and was wondering if there is a way to upload without this piece. I’m normally used to accessing this data in Jupyter with Pandas, but it doesn’t seem to translate into Dash the way I usually do it. Any help would be greatly appreciated.

My code:

app.layout = html.Div([
    html.B('Please upload patient SNP data below.  The file format should be .txt'),
    html.Br(),
    dcc.Upload(
        id='upload-data',
        children=html.Div([
            'Drag and Drop or ',
            html.A('Select Files')
        ]),
        style={
            'width': '100%',
            'height': '60px',
            'lineHeight': '60px',
            'borderWidth': '1px',
            'borderStyle': 'dashed',
            'borderRadius': '5px',
            'textAlign': 'center',
            'margin': '10px'
        },
        # Allow multiple files to be uploaded
        multiple=True
    ),
    html.Div(id='output-data-upload'),
])


def parse_contents(contents, filename):
    content_type, content_string = contents.split(',')

    decoded = base64.b64decode(content_string)
    try:
        if 'txt' in filename:
            # Assume that the user uploaded a CSV file
            df = pd.read_csv(
                io.StringIO(decoded.decode('utf-8'), delimiter = r'\s+'))
                            #dtype={'rsid':'str', 'chromosome':'object', 'position':'int', 'genotype':'str'}, comment='#'))
        elif 'xls' in filename:
            # Assume that the user uploaded an excel file
            df = pd.read_excel(io.BytesIO(decoded))
    except Exception as e:
        print(e)
        return html.Div([
            'There was an error processing this file.'
        ])

    return html.Div([
        html.H5(filename),
        #html.H6(datetime.datetime.fromtimestamp(date)),

        dash_table.DataTable(
            data=df.to_dict('records'),
            columns=[{'name': i, 'id': i} for i in df.columns]
        ),
        html.Div(
            [dcc.Markdown('''**There are {:.0f} matching SNPS**'''.format(len(df)))]
        ),

        html.Hr(),  # horizontal line

        # For debugging, display the raw contents provided by the web browser
        html.Div('Raw Content'),
        html.Pre(contents[0:200] + '...', style={
            'whiteSpace': 'pre-wrap',
            'wordBreak': 'break-all'
        })
    ])


@app.callback(Output('output-data-upload', 'children'),
              [Input('upload-data', 'contents')],
              [State('upload-data', 'filename')])
def update_output(list_of_contents, list_of_names):
    if list_of_contents is not None:
        children = [
            parse_contents(c, n) for c, n in
            zip(list_of_contents, list_of_names)]
        return children

How I normally convert the .txt file when using pandas:

data = pd.read_csv('my_genome.txt', sep='\t', dtype={'rsid':'str', 'chromosome':'object', 'position':'int', 'genotype':'str'}, comment='#')

the .txt file looks like this:

'# This data file generated by 23andMe at: Mon Mar 23 11:18:23 2020
'#
'# This file contains raw genotype data, including data that is not used in 23andMe reports.
'# This data has undergone a general quality review however only a subset of markers have been
'# individually validated for accuracy. As such, this data is suitable only for research,
'# educational, and informational use and not for medical or other use.
'#
'# Below is a text version of your data. Fields are TAB-separated
'# Each line corresponds to a single SNP. For each SNP, we provide its identifier
'# (an rsid or an internal id), its location on the reference human genome, and the
'# genotype call oriented with respect to the plus strand on the human reference sequence.
'# We are using reference human assembly build 37 (also known as Annotation Release 104).
'# Note that it is possible that data downloaded at different times may be different due to ongoing
'# improvements in our ability to call genotypes. More information about these changes can be >found at:
'# https://you.23andme.com/p/dfad0fe3300c6e2e/tools/data/download/
'#
'# More information on reference human assembly builds:
'# https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.13/
'#
rsid chromosome position genotype
rs548049170 1 69869 TT
rs13328684 1 74792 –
rs9283150 1 565508 AA
i713426 1 726912 AA
rs116587930 1 727841 GG
rs3131972 1 752721 GG

Solved! I just had to get rid of the “if”, “elif”, “except” and then I was able to get a traceback which allowed me to see the issue.