Figure Friday 2024 - week 28

Hi @AnnMarieW and @li.nguyen

super exciting to see you use PyCafe for this, and both amazing apps. Also, both of them are surprising little code, I’m very impressed.

Regards,

Maarten

4 Likes

hello all.
first, i wanted to thank the organisers for setting this up. my gratitude also goes out to other participants for their contributions. i am learning a lot from this effort. this is a great initiative and i am happy to take part. modest as my contributions may be.

i did not build a dashboard, so my contribution is quite simple in comparison to what i am seeing here. i look forward to build my skills and submit my own dash/streamlit dashboard at some point.

so what have i done? i spent a lot of time on basic exploratory data analysis hoping to find some valuable nugget of insight.

i was first curious about what the store was selling and a glance at the product name list wasn’t conclusive, so i pushed the product names through a wordcloud generator. that made it quite clear at a glance that we were talking about an officemax/office depot/staples kind of a store.

next, i looked into how the product names were organised inside the categories and subcategories and drew a treemap of them with their total sales as the metric.

(i also tried sunburst a a mode to explore this but it did not come out as well.

looking at the plot given with the challenge i found that losses particularly interesting, but the proportional losses were not easily gleaned from that chart, so first i added guide lines to classify the profit margin and losses:


it surprised me that the losses were up to 300% of the sales values.
that made me fixated on the losses. what drives them? where do they occur the most?

well it was easy to spot that the furniture category had by far the highest loss-to-sales ratio, but i still didn’t see why.
first i thought it might just be because shipping times were slower and that led to more cancellations and returns which might cause losses.


no obvious pattern between the categories emerged when i plotted the profit/losses as a function of shipping/waiting time.
then i thought that the profit/loss were misleading because each row is a line item, and i should be considering each order as a whole. the orders have an average of 2 line items each. so i aggregated the line items to the order ids, and considered the profit vs sales at the order level hoping that when added up the line items in the order would tend towards profitability. the orders should each be profitable even if they include a loss-leader line item.

unfortunately, no that was not the case, there are still over 100% losses on the order level… (sorry ran out out permitted attachments)

2 Likes

very interesting analysis you did there, @jens ; you did some digging.
I used to use wordclouds a lot because it’s a nice way of showing what people talk about or what words are most frequently used. Your case was unique and creative in that you used it to get a better understanding of the store :bulb:

Regarding the last chart, I honestly thought that there would be a correlation between shipping time and profit.

1 Like

Hey @jens,

Firstly, great job! :star2: I really appreciate how you’ve shared your thought process and figured out which charts work best. Do you have a background in analytics? Your structured approach and the way your exploratory analysis led to more questions is really engaging. This is exactly what got me into data visualization a few years back. It’s always interesting to see how different people can interpret and visualize the same data set in unique ways :rocket:

Don’t worry about not creating a dashboard. It’s not obligatory for this exercise as mentioned by @adamschroeder! Honestly, making the chart is usually the toughest part. Adding it to a dashboard is much easier. If you haven’t come across Vizro, it’s a tool that could simplify that dashboard creation step for you. It’s built on top of Dash and it’s what I’ve been using in my submission above :slight_smile:

1 Like

Bit late for posting this here!

3 Likes

This is great! I love the visual hexagons in the chart :star:

:wave: hi @PyBluePanda nice name :slight_smile: and welcome to the community.

Nice use of the Sankey diagram and the three dropdowns make it easy to filter the data. Good job.

We’ll be posting the next FigureFriday data set in a couple of hours. Hopefully, you can join that one as well.