Black Lives Matter. Please consider donating to Black Girls Code today.

Using Pandas dataframe with a hierarchy for many different figures

I have searched and searched in Pandas sites but cannot find the answer to this. So I hope it is ok to ask here.

I have a number of possible business dashboards I need to build. All of the dataframes/data sets have some sort of natural hierarchy to them. For example, here is a typical one

  • Store, City, State, Region, Date, Sales

My figures may be

  • Sales by Store by Month,
  • Sales by City by Month,
  • Sales by State by Month,

So far, when I build different figures that have different levels of detail, I have had trouble using my original dataframe at its most granular level, i.e. Store, and I end up creating a separate dataframe per figure. Somehow this seems inefficient.

The problem I run into using the original dataframe, i.e. Store, City, State, Region, Date, Sales, is that many of the figures attempt to display breaks or lines. For example a bar chart of Sales by Store will show a bar per store, but inside the bar, there are multiple lines, i.e. it displays as a stacked bar chart with each store’s month stacked on top of another.

Without addressing the issue of the stacked bar chart look, I am more interested in finding out how others work with dataframes that have these sorts of hierarchies.

  1. Are you able to use one very granular dataframe for multiple types of charts displaying multiple levels, like Store, City, State?
  2. Or do you do what I do and create a separate dataframe specifically for the figure and level of detail, ie, City or State, you want to display?

Thanks for any thoughts.

I may have found the answer, but being inexperienced with Dash/Plotly and Pandas, I will post a reply to myself hoping someone will correct me if I am wrong.

The answer is that Python has a way to free memory for dataframes not being used. The reference is at https://stackabuse.com/basics-of-memory-management-in-python/.

I gather that if I create what we might call secondary data frames, or maybe better named might be summarized or aggregated dataframes, i.e. not putting them into higher level .py files and making them universal, then Python will clean these up when I move to other dashboards.

If I am wrong on this, please let me know.

If my brief comments are reasonably accurate, I know I have more to learn on the subject, so please post any thoughts.

I appreciate the contributors very much.

James