Sankey Diagram Data Preprocessing - Python

I am having a hard time understanding how to preprocess my Python Pandas dataframe to pull into a Sankey diagram script. I have a df of online consumer data with multiple rows pertaining to one consumer. The dataframe has over 50,000 consumer sample paths. Does anyone know how to approach this?

Thanks!

Andrew

I hope you did the right approach. I’am dealing with a similar challange now (5000 rows x 10 cols data frame) and it’s definitely not a piece of cake.
Best!

Hi @Andrewash46 and @mihalw28,

The pre-processing approach depends on the form of the data you’re starting with. A Sankey diagram is designed to represent a weighted graph (nodes, edges, and weights).

You may also want to take a look at the new Parallel Categories diagram (See https://plot.ly/python/parallel-categories-diagram/). This has a superficially similar appearance to a Sankey diagram, but it’s designed to represented multi-dimensional categorical datasets.

Feel free to share more details on your dataset if you want to talk through it more,
-Jon

Thanks for your response @jmmease.

Parallel categories diagram is new one for me and it looks very interesting on examples. I`ll try it with my data. Once again, thanks. :wink: