I’m in the process of converting one of our internal dashboards to use polars instead of pandas, and to do so, I wanted to display some of the dataframes in a dash datatable. I assumed that, because Plotly now supports polars (excellent work, y’all!), that Dash would also have some info on this. Turns out, however, that
- a) this wasn’t quite as straightforward as I thought, and
- b) there were 0 forum posts to help me out (which is usually the source of all of my Dash/Plotly wisdom).
Posting here how I did it in the hope that it will save someone else a headache and half a day’s work trying to figure this out.
Useful piece of information #1: Use to_dicts()
instead of to_dict("records")
When you pass in a pandas dataframe to a Datatable, you use the method
dash_table.DataTable(df.to_dict('records'))
This doesn’t work for a polars dataframe. Polars also has a built-in method called to_dict()
, but this doesn’t work either. What you need is something that turns a polars dataframe into a list of dictionaries, which is what the polars method to_dicts()
does. The datatable component can work with that, so you can pass in the df like this
dash_table.DataTable(df.to_dicts())
and move on with your life.
But TBone! Do all of my favorite datatable properties still work?
Thus far, everything I have tried to use with this works except the tooltips (and I haven’t yet cared enough to invest the time to figure that out, but I’m fairly sure it can be done. if someone figures it out, please chime in).
Useful piece of information #2: if you have any nested or complex columns in your polars df, stringify them to make them JSON serializable
Polars is designed to do great with complicated data structures such as lists or multi-level nested struct
s (polars version of a dictionary/JSON-ish object) in a column. You know what does less well with that? I’ll give you a hint: who among us hasn’t at some point received the dreaded TypeError: Object of type [FILL IN YOUR PERSONAL NEMESIS HERE]is not JSON serializable
error?
TBone. WTF are you talking about? Can I use a polars df or not?
Yes, you can, but if you have a column that contains a nested data structure, you will get that error when trying to pass your df into a datatable. To use it, you will have to convert that column to its string representation. Since the df is not allowed to have nulls if you want to do this, and filling those was also not quite as straightforward as I hoped, I wrote a helper function that will fill the nulls and stringify the list and struct columns to create a “display_df”.
I Read This Far, Now Get to the Point: How Can I Use This?
Like this minimum example here:
from dash import Dash, dash_table
import polars as pl
df = pl.DataFrame([
{"a": 1, "b": ["x", "y"], "c": {"lastName": "Mustermann", "firstName": "Marcus", "status": "deactivated"}},
{"a": 2, "b": ["z", "h", "s"], "c": {"lastName": "Schmoe", "firstName": "Joe", "status": "active"}}
])
print(df)
def convert_df_to_stringified_display_df(inputDf):
displayDf = inputDf.drop(["_rid", "_self", "_etag", "_attachments", "_ts"])
for colName in displayDf.columns:
try:
displayDf = displayDf.with_columns(pl.col(colName).fill_null("Null"))
except pl.InvalidOperationError as ioe:
# print(ioe)
if str(ioe).startswith("cannot cast List type"):
# print(f"didn't work on col {colName}, filled with empty list instead")
displayDf = displayDf.with_columns(pl.col(colName).fill_null([]))
else:
print("unexpected dtype, check again!")
if str(displayDf.select(pl.col(colName)).dtypes[0]) not in ["String", "Int64", "Boolean"]:
# print("complex dtype --> convert to str")
if str(displayDf.select(pl.col(colName)).dtypes[0]).startswith("Struct"):
# print(f"converting Struct column {colName}")
displayDf = displayDf.with_columns(pl.col(colName).struct.json_encode())
elif str(displayDf.select(pl.col(colName)).dtypes[0]).startswith("List"):
# print(f"converting List column {colName}")
displayDf = displayDf.with_columns(("[" + pl.col(colName).cast(pl.List(pl.Utf8)).list.join(", ")+ "]").alias(f"stringifiedList_{colName}"))
for colName in displayDf.columns:
if colName.startswith("stringifiedList_"):
displayDf = displayDf.drop( colName.replace("stringifiedList_", "") )
displayDf = displayDf.rename({colName: colName.replace("stringifiedList_", "")})
return displayDf
df_display = convert_df_to_stringified_display_df(df)
app = Dash(__name__)
app.layout = dash_table.DataTable(df_display.to_dicts(), [{"name": i, "id": i} for i in df.columns])
if __name__ == '__main__':
app.run(debug=True)
Hope this saves someone some time out of their busy day.
And if you read this far: good on you. Gold star and I hope you enjoyed it.