Polars df in a Dash Datatable

I’m in the process of converting one of our internal dashboards to use polars instead of pandas, and to do so, I wanted to display some of the dataframes in a dash datatable. I assumed that, because Plotly now supports polars (excellent work, y’all!), that Dash would also have some info on this. Turns out, however, that

  • a) this wasn’t quite as straightforward as I thought, and
  • b) there were 0 forum posts to help me out (which is usually the source of all of my Dash/Plotly wisdom).

Posting here how I did it in the hope that it will save someone else a headache and half a day’s work trying to figure this out.

Useful piece of information #1: Use to_dicts() instead of to_dict("records")

When you pass in a pandas dataframe to a Datatable, you use the method

dash_table.DataTable(df.to_dict('records'))

This doesn’t work for a polars dataframe. Polars also has a built-in method called to_dict() , but this doesn’t work either. What you need is something that turns a polars dataframe into a list of dictionaries, which is what the polars method to_dicts() does. The datatable component can work with that, so you can pass in the df like this

dash_table.DataTable(df.to_dicts())

and move on with your life.

But TBone! Do all of my favorite datatable properties still work?

Thus far, everything I have tried to use with this works except the tooltips (and I haven’t yet cared enough to invest the time to figure that out, but I’m fairly sure it can be done. if someone figures it out, please chime in).

Useful piece of information #2: if you have any nested or complex columns in your polars df, stringify them to make them JSON serializable

Polars is designed to do great with complicated data structures such as lists or multi-level nested structs (polars version of a dictionary/JSON-ish object) in a column. You know what does less well with that? I’ll give you a hint: who among us hasn’t at some point received the dreaded TypeError: Object of type [FILL IN YOUR PERSONAL NEMESIS HERE]is not JSON serializable error?

TBone. WTF are you talking about? Can I use a polars df or not?

Yes, you can, but if you have a column that contains a nested data structure, you will get that error when trying to pass your df into a datatable. To use it, you will have to convert that column to its string representation. Since the df is not allowed to have nulls if you want to do this, and filling those was also not quite as straightforward as I hoped, I wrote a helper function that will fill the nulls and stringify the list and struct columns to create a “display_df”.

I Read This Far, Now Get to the Point: How Can I Use This?

Like this minimum example here:

from dash import Dash, dash_table
import polars as pl

df = pl.DataFrame([
    {"a": 1, "b": ["x", "y"], "c": {"lastName": "Mustermann", "firstName": "Marcus", "status": "deactivated"}},
    {"a": 2, "b": ["z", "h", "s"], "c": {"lastName": "Schmoe", "firstName": "Joe", "status": "active"}}
])

print(df)

def convert_df_to_stringified_display_df(inputDf):

    displayDf = inputDf.drop(["_rid", "_self", "_etag", "_attachments", "_ts"])

    for colName in displayDf.columns:
        try:
            displayDf = displayDf.with_columns(pl.col(colName).fill_null("Null"))
        except pl.InvalidOperationError as ioe:
            # print(ioe)
            if str(ioe).startswith("cannot cast List type"):
                # print(f"didn't work on col {colName}, filled with empty list instead")
                displayDf = displayDf.with_columns(pl.col(colName).fill_null([]))
            else:
                print("unexpected dtype, check again!")

        
        if str(displayDf.select(pl.col(colName)).dtypes[0]) not in ["String", "Int64", "Boolean"]:
            # print("complex dtype --> convert to str")
            if str(displayDf.select(pl.col(colName)).dtypes[0]).startswith("Struct"):
                # print(f"converting Struct column {colName}")
                displayDf = displayDf.with_columns(pl.col(colName).struct.json_encode())
            elif str(displayDf.select(pl.col(colName)).dtypes[0]).startswith("List"):
                # print(f"converting List column {colName}")
                displayDf = displayDf.with_columns(("[" + pl.col(colName).cast(pl.List(pl.Utf8)).list.join(", ")+ "]").alias(f"stringifiedList_{colName}"))

    for colName in displayDf.columns:
        if colName.startswith("stringifiedList_"):
            displayDf = displayDf.drop( colName.replace("stringifiedList_", "") )
            displayDf = displayDf.rename({colName: colName.replace("stringifiedList_", "")})
    
    return displayDf


df_display = convert_df_to_stringified_display_df(df)

app = Dash(__name__)

app.layout = dash_table.DataTable(df_display.to_dicts(), [{"name": i, "id": i} for i in df.columns])

if __name__ == '__main__':
    app.run(debug=True)

Hope this saves someone some time out of their busy day.

And if you read this far: good on you. Gold star :star2: and I hope you enjoyed it.

2 Likes

Hi @tbonethemighty

This is very helpful - thanks for posting!

Have you tried Dash AG Grid? It can handle come complex data types like lists and dicts in cells.

1 Like

@AnnMarieW Thanks for the idea! No, actually I haven’t done much with AG Grid at all – been using Dash for over 5 years now (yes, really, I think my first production dashboard used dash version 0.18 or something ridiculous like that), so I’ve been using datatables since they came out and when AG Grid came out, it looked like overkill for my use cases and I was too lazy to transfer everything over :laughing: But you got me curious about this, so I decided to compare this as well. I tried to create a similar minimum-effort example here:

from dash import Dash, html, dash_table
import dash_ag_grid as dag
import polars as pl

df = pl.DataFrame([
    {"a": 1, "b": ["x", "y"], "c": {"lastName": "Mustermann", "firstName": "Marcus", "status": "deactivated"}},
    {"a": 2, "b": ["z", "h", "s"], "c": {"lastName": "Schmoe", "firstName": "Joe", "status": "active"}}
])

# print(df)

def convert_df_to_stringified_display_df(inputDf):

    displayDf = inputDf.drop(["_rid", "_self", "_etag", "_attachments", "_ts"])

    for colName in displayDf.columns:
        try:
            displayDf = displayDf.with_columns(pl.col(colName).fill_null("Null"))
        except pl.InvalidOperationError as ioe:
            # print(ioe)
            if str(ioe).startswith("cannot cast List type"):
                # print(f"didn't work on col {colName}, filled with empty list instead")
                displayDf = displayDf.with_columns(pl.col(colName).fill_null([]))
            else:
                print("unexpected dtype, check again!")

        
        if str(displayDf.select(pl.col(colName)).dtypes[0]) not in ["String", "Int64", "Boolean"]:
            # print("complex dtype --> convert to str")
            if str(displayDf.select(pl.col(colName)).dtypes[0]).startswith("Struct"):
                # print(f"converting Struct column {colName}")
                displayDf = displayDf.with_columns(pl.col(colName).struct.json_encode())
            elif str(displayDf.select(pl.col(colName)).dtypes[0]).startswith("List"):
                # print(f"converting List column {colName}")
                displayDf = displayDf.with_columns(("[" + pl.col(colName).cast(pl.List(pl.Utf8)).list.join(", ")+ "]").alias(f"stringifiedList_{colName}"))

    for colName in displayDf.columns:
        if colName.startswith("stringifiedList_"):
            displayDf = displayDf.drop( colName.replace("stringifiedList_", "") )
            displayDf = displayDf.rename({colName: colName.replace("stringifiedList_", "")})
    
    return displayDf


df_display = convert_df_to_stringified_display_df(df)

app = Dash(__name__)

app.layout = html.Div([
    html.H6("Dash Datatable"),
    dash_table.DataTable(
        data=df_display.to_dicts(),
        columns=[{"name": i, "id": i} for i in df.columns]
    ),
    html.Br(),
    html.Hr(),
    html.H6("Dash AG Grid, un-stringified Polars Df"),
    dag.AgGrid(
        # rowData=df.to_dict("records"), # does not throw an error in the terminal running the code, but does in the browser (error loading layout)
        rowData=df.to_dicts(), # no conversion to stringified version, sending in the df as-is
        columnDefs=[{"field": i} for i in df.columns],
    ),
    html.Br(),
    html.Hr(),
    html.H6("Dash AG Grid, Stringified Polars Df"),
    dag.AgGrid(
        rowData=df_display.to_dicts(), # stringified version of the df
        columnDefs=[{"field": i} for i in df.columns],
    ),  
])

if __name__ == '__main__':
    app.run(debug=True)

which looks like this:

So it seems like the same steps are required here for AG Grid as for the Datatable, and the useful information will also be useful here! If I pass in the polars df without stringifying the complex column structures, the arrays are fine, but JS does that annoying thing that it does of printing out [Object object]. (On the quick, and please note that I did absolutely no styling whatsoever on this example, the datatable makes much better use of the space on the page :wink: )

So I guess it would depend on what you need the table to do, but if you have heavily nested data structures (as, sadly, my data does), that step of stringifying the columns might still be needed for AG Grid as well as for the datatable. Does look like AG Grid has a ton of very cool options, though, so it may still be the better choice, depending on the use case.

Thanks for prompting me to take another look at it! Got my curiosity flowing now…

Hello @tbonethemighty,

As far as [object object], this is because you don’t have anything to read the object or convert it into something useful for displaying.

You can do this by adding a valueFormatter or cellRenderer.

Simple valueFormatter that should work:

“valueFormatter”: {“function”: “params.value ? JSON.stringify(params.value) : null”}

You could even not pass this column in the columnDefs and instead have something that allows them to split out.

As far as the styling, you have lots of options to allow for better use of the space. :grin:

1 Like

Hey @jinnyzor Thanks for the tips! I am familiar with the phenomenon, and I know I could add a formatter to those columns or split them out (although the methods I’m using in the stringify function are polars native functions – I haven’t tried to use JSON.stringify on a polars df column yet, so not sure what would happen. Might indeed be the simpler chioce). The goal here was just to do an as-close-as-possible comparison to the code I was using for the datatable :wink: and see what that minimal example produced.

Same for the formatting – I am 100% sure that there are tons of formatting options to make the default rendering better, but the point was to look at the default. If and when I stand something up using the AG Grid, I will undoubtedly do a lot more to it than I did here to get it into shape for production :laughing:

1 Like

Hi @tbonethemighty

Nice to see a long time Dash user! :tada:

If you have millions of rows, you might be interested in this article:

Note that with Dash AG Grid, you can simply access dicts in column “c” like this c["firstName"] or with a dot notation. c.firstName

from dash import Dash, html
import dash_ag_grid as dag
import polars as pl

df = pl.DataFrame([
    {"a": 1, "b": ["x", "y"], "c": {"lastName": "Mustermann", "firstName": "Marcus", "status": "deactivated"}},
    {"a": 2, "b": ["z", "h", "s"], "c": {"lastName": "Schmoe", "firstName": "Joe", "status": "active"}}
])


app = Dash(__name__)

app.layout = html.Div([
    html.H6("Dash AG Grid"),
    dag.AgGrid(
        rowData=df.to_dicts(),
        columnDefs=[
            {"field": "a"},
            {"field": "b"},
            {"headerName": "Name", "valueFormatter": {"function": "params.data.c.firstName + ' ' + params.data.c.lastName"}},
            {"field": "c.status", "headerName": "Status"},
        ],
        columnSize="sizeToFit",
    ),
])

if __name__ == '__main__':
    app.run(debug=True)

For a quick start see this medium article:

To help get started switching from DataTable to AG Grid, you can find a bunch of helpful articles

2 Likes

Hi @tbonethemighty
Thank you for sharing this tutorial with the community. Reading it reminded me of another article and app that were written to highlight polars with Dash.

2 Likes

Nice! Thanks for all the resources, team!