Help required generating chart

Hi all,

I have a data set that is multiple rows of the form:

{
   'game': 955,
   'gamedetails.date': '2024-01-30',
   'gamedetails.solution': 'expel',
   'guesses': 5,
   'id': 1,
   'success': 1,
   'user': 1,
   'userdetails.fullname': 'Andrew Hawkins',
   'userdetails.username': 'andy'
}

(the ‘guesses’ field will be either a number from 1-6, or ‘Fail’)

I want to generate a plot that looks like this:

The vertical bars represent the percentage of times each number of guesses appeared for each user based on the number of results they actually have (not all users have results for all games).

I can get close using the following code:

    fig = px.histogram(
        results,
        width=1920,
        height=1080,
        x="userdetails.username",
        y="guesses",
        color="guesses",
        barmode="group",
        orientation="v",
        histnorm="percent",
        category_orders={"guesses": ["1", "2", "3", "4", "5", "6", "Fail"]},
    )

This give me this:

There are a couple of issues with this:

  1. The ordering of the bars for each user isn’t in the expected ‘1, 2, 3, 4, 5, 6 Fail’ order
  2. The percentages just don’t add up. For example, for ‘james’ the three tallest bars seem to be around 50, 35 and 20, which is already over 100%

Can anyone help me get the plot I’m trying to achieve?

Thanks

Andy

Hey @adhawkins welcome to the forums.

Could you provide some data (for copy&paste) to play around with?

Here you go @AIMPED :

https://termbin.com/li1s

Andy

Hey @adhawkins,

with the data you provided I don’t get the same figure as the one you provided. I guess you did some kind of preprocessing. Is that the case?

newplot(3)

import requests
import pandas as pd
import plotly.express as px
import json


# function source: https://stackoverflow.com/questions/16573332/jsondecodeerror-expecting-value-line-1-column-1-char-0
def get_json(url):
    response = requests.get(url)
    print(response.raise_for_status())  
    if response.status_code != 204:
        return response.json()

    
js = get_json('https://termbin.com/li1s')

results = pd.DataFrame(js['results'])

fig = px.histogram(
    results,
    width=600,
    height=400,
    x="userdetails.username",
    y="guesses",
    color="guesses",
    barmode="group",
    orientation="v",
    histnorm="percent",
    category_orders={"guesses": ["1", "2", "3", "4", "5", "6", "Fail"]},
)

fig.show()

@AIMPED Yes, sorry. There is some slight modification of the data, but the end result is very similar.

Your results show similar things to mine in that the order of the bars isn’t logical, and also the some of the percentages of the bars (for example in the ‘old’ group) is obviously not correct.

This is the full code:

results = fetchAllResults()
if results:
    for result in results:
        if result["success"]:
            result["guesses"] = str(result["guesses"])
        else:
            result["guesses"] = "Fail"

    results = sorted(results, key=lambda x: (x["user"], x["guesses"]))

    pprint(results)

    fig = px.histogram(
        results,
        width=1920,
        height=1080,
        x="userdetails.username",
        y="guesses",
        color="guesses",
        barmode="group",
        orientation="v",
        histnorm="percent",
        category_orders={"guesses": ["1", "2", "3", "4", "5", "6", "Fail"]},
    )

    fig.update_layout(
        legend=dict(
            yanchor="top",
            y=-0.05,
            xanchor="left",
            x=0,
        ),
        title_x=0.5,
    )

    fig.write_image("/home/andy/fig1.png")

The ‘fetchAllResults’ function returns the data I provided in the earlier post.

Thanks for taking the time to look in to this.

Andy

Well, this is because plotly.expressdoes some grouping of your DataFrame under the hood. Sorting the df before creating the figure solves this issue:

results = results.sort_values('guesses', axis=0)

newplot(7)

See also

I imagine, the issue with the percentage is based on the same reason. I’ not sure which values are used to calculate the percentage.

I was already sorting the results, but was doing it based on user first, then number of guesses. I’ve now swapped the order of these around, and I’m getting the bars in a more logical order.

Any way I can get it to calculate the percentages correctly?

Failing that, is there any way I can pass in a data set along the lines of the following:

[
	{
		"user": "andy",
		"1": 2,
		"2": 20,
		"3": 30,
		"4": 20,
		"5": 10,
		"6": 5,
		"Fail": 3
	},
	{
		"user": "james",
		"1": 3,
		"2": 21,
		"3": 31,
		"4": 18,
		"5": 8,
		"6": 5,
		"Fail": 14
	}
]

(i.e. I have already calculated the percentages for each guess) and just have it render the bars based on the numbers for each data point?

Thanks again

Andy

Hi @adhawkins I’m not sure if I understood your last post. If you wanted to create a bar graph with the data provided, you could do something like that.

data = [
	{
		"user": "andy",
		"1": 2,
		"2": 20,
		"3": 30,
		"4": 20,
		"5": 10,
		"6": 5,
		"Fail": 3
	},
	{
		"user": "james",
		"1": 3,
		"2": 21,
		"3": 31,
		"4": 18,
		"5": 8,
		"6": 5,
		"Fail": 14
	}
]

df = pd.DataFrame.from_records(data)
fig = px.bar(df, x='user', y=df.columns[1:], barmode='group')
fig.show()

Concerning the percentages, I tried to figure out how these are calculated but it took me too long :see_no_evil: :sweat_smile:

I even think, it might be a bug. If you comment out the color parameter of your original graph, the percentages sum up correctly. This has also been reported here.

Thanks, that’s exactly what I need.

Andy

1 Like