Black Lives Matter. Please consider donating to Black Girls Code today.

How to create a bar chart that count the distinct value of single column of dataframe

I saw the code that allows the user to upload files and auto plot graphs in plotly dash datatable users guide. I want the graph change to count the distinct value in single column when users upload a file. For example:

Country
Guinea
Guinea 
Guinea
Liberia 
Liberia

I want the x-axis of the bar chart to be the name of the countries and the y-axis will be the amount of Guinea and Liberia. Can anyone teach me how to do it?

Below is the user guide code:

import base64
import io
import dash
from dash.dependencies import Input, Output, State
import dash_core_components as dcc
import dash_html_components as html
import dash_table
import pandas as pd

external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

app = dash.Dash(__name__, external_stylesheets=external_stylesheets)

app.layout = html.Div([
    dcc.Upload(
        id='datatable-upload',
        children=html.Div([
            'Drag and Drop or ',
            html.A('Select Files')
        ]),
        style={
            'width': '100%', 'height': '60px', 'lineHeight': '60px',
            'borderWidth': '1px', 'borderStyle': 'dashed',
            'borderRadius': '5px', 'textAlign': 'center', 'margin': '10px'
        },
    ),
    dash_table.DataTable(id='datatable-upload-container'),
    dcc.Graph(id='datatable-upload-graph')
])


def parse_contents(contents, filename):
    content_type, content_string = contents.split(',')
    decoded = base64.b64decode(content_string)
    if 'csv' in filename:
        # Assume that the user uploaded a CSV file
        return pd.read_csv(
            io.StringIO(decoded.decode('utf-8')))
    elif 'xls' in filename:
        # Assume that the user uploaded an excel file
        return pd.read_excel(io.BytesIO(decoded))


@app.callback([Output('datatable-upload-container', 'data'),
               Output('datatable-upload-container', 'columns')],
              [Input('datatable-upload', 'contents')],
              [State('datatable-upload', 'filename')])
def update_output(contents, filename):
    if contents is None:
        return [{}], []
    df = parse_contents(contents, filename)
    return df.to_dict('records'), [{"name": i, "id": i} for i in df.columns]


@app.callback(Output('datatable-upload-graph', 'figure'),
              [Input('datatable-upload-container', 'data')])
def display_graph(rows):
    df = pd.DataFrame(rows)

    if (df.empty or len(df.columns) < 1):
        return {
            'data': [{
                'x': [],
                'y': [],
                'type': 'bar'
            }]
        }
    return {
        'data': [{
            'x': df[df.columns[0]],
            'y': df[df.columns[1]],
            'type': 'bar'
        }]
    }


if __name__ == '__main__':
    app.run_server(debug=True)

You’ll want to use a histogram here as opposed to a bar chart. You are going to pass your list of countries into the histogram as x, and set histfunc = ‘count’.

Essentially you’ll want be doing this in your callback.

import plotly.graph_objects as go

...

fig = go.Figure()
fig.add_trace(go.Histogram(histfunc="count",  x=list_of_countries))
return fig.
2 Likes

If I wan to use .value_counts() function to count the distinct value, how can I apply it to the chart?

So if you have a list and it has repeating names in it then there should be no need to use value_counts(). The plotly histogram can take a list and determine how many times unique values appear in said list. Then it will sort those values into bins and plot a graph indicating how many times each unique value appears in the list! Hope this helps!

Edit: I should clarify that you need to set histfunc to “count” like in code I showed you above to accomplish this! :slightly_smiling_face:

Thanks for the help @Krichardson.