I am having a hard time understanding how to preprocess my Python Pandas dataframe to pull into a Sankey diagram script. I have a df of online consumer data with multiple rows pertaining to one consumer. The dataframe has over 50,000 consumer sample paths. Does anyone know how to approach this?
I hope you did the right approach. Iβam dealing with a similar challange now (5000 rows x 10 cols data frame) and itβs definitely not a piece of cake.
Best!
The pre-processing approach depends on the form of the data youβre starting with. A Sankey diagram is designed to represent a weighted graph (nodes, edges, and weights).
You may also want to take a look at the new Parallel Categories diagram (See https://plot.ly/python/parallel-categories-diagram/). This has a superficially similar appearance to a Sankey diagram, but itβs designed to represented multi-dimensional categorical datasets.
Feel free to share more details on your dataset if you want to talk through it more,
-Jon