Plotly chart like heat map with values from one column and color from another column

I have the following dataset with many rows, multiple samples and 3 columns. I need to plot a graph which looks like a heatmap but it should fill the color not just for the position but also till previous position.

Name position category
Sample1 15500 1
Sample1 15800 2
Sample1 16200 2
Sample1 17200 3
Sample1 17400 3
Sample1 17700 3
Sample1 18300 2
Sample1 20010 2
Sample1 22120 1
Sample1 30000 3
Sample2 15880 1
Sample2 16200 1
Sample2 16900 3
Sample2 18200 3
Sample2 18500 2
Sample2 20400 1
Sample2 21300 2
Sample2 24800 3
Sample2 26000 1
Sample2 30000 3

I first converted this to a pivot table

import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go

sample_pivot=sample.pivot_table(columns=“position” , index= “Name”, values=“category”)

Then i used plotly to chart this

fig = px.imshow(sample_pivot)
fig.update_xaxes(range=[1, 35000])
fig.show()

I got the following chart

I need to modify above chart to fill the color in between the positions like below where the color filling will trace back to previous position and show up on the chart

Sample 1 :
1-15500 → Dark blue
15501-16200 → orange red
16201-17700 → yellow
17701-20010 → orange red
20011-22120 → Dark blue
22121-30000 → yellow

Hi @yada !

Welcome on the Forum! :tada:

I’m not sure px.imshow() is the best way to achieve what you need.
Not impossible, but you will have to construct the whole “grid” of your image with the smallest step of your 'position' which seems to be 10 with the sample you provided.
Meaning if the range is [0,35000] you will have an array of 3500 values for each “Sample”, with approximately 1500 for your first 'position' only… Whereas you should to need only one “rectangle”.

I propose to construct only the “rectangles” you need, using a horizontal bar figure.
For that you will have to provide:

  • The "base" which will be the left side position of the rectangles
  • The "x" will be the width of the rectangle
  • The "y" will be the name 'Name'

The figure should use one trace by 'category', to be easier to handle and so that you can provide one color for each trace.

The main work here is to find the right "base" and "x".

Here is the result:

Code
# Shift index of 1 step to add a row at index 0
df['base'] = df['position'].shift()
# The width of the rectangles is the difference between two successive rows
df['width'] = df['position'].diff()

# set the first base and width for each Sample, meaning when we have a NaN value or when the width above is negative
# Then we have to "fix" the first base and first width for each Sample.
# We can detect the first base and width for each Sample when the calculated width above is negative, or NaN for Sample1
# first base = 0
df['base'] = df.apply(lambda x: 0 if (x['width'] < 0 or pd.isna(x['base'])) else x['base'], axis=1)
# first width = first 'position' of the Sample
df['width'] = df.apply(lambda x: x['position'] if (x['width'] < 0 or pd.isna(x['width'])) else x['width'], axis=1)

color = {1: 'blue', 2: 'orange', 3: 'yellow'}

fig = go.Figure()
# Create one trace for each category
for cat in sorted(df['category'].unique()):
    # filter by cat
    dff = df[df['category'] == cat]
    fig.add_bar(
        orientation='h',
        base=dff['base'],
        x=dff['width'],
        y=dff['Name'],
        marker_color=color[cat],
        marker_line_width=0,
        name=str(cat)
    )

# reverse y to have Sample 1 at the top
fig.update_yaxes(autorange="reversed")
# set bargap=0 to have no gap between horizontal bars, and overlay to not stack the traces
fig.update_layout(legend_title_text='Category', bargap=0, barmode="overlay")

fig.show()

Is it what you need?

Not sure if related but maybe interesting:

Hello @Skiks ,

Thank you for your solution. I didn’t think of representing this way in the form of creating rectangles. It worked for my dataset. Still a beginner in using lambda functions and trying to understand the solution. Thanks for your clear explanation.

Also, i am looking to represent the dataset slightly differently as i am looking to use start and end positions to perform comparison on tool efficiency. Can we slightly modify your code to represent the blocks or is there a easy way in plotly chart which represents this kind of dataset?

Name Start End category
Sample1 2500 15500 1
Sample1 15700 15800 2
Sample1 16000 16200 2
Sample1 16300 17200 3
Sample1 17350 17400 3
Sample1 17550 17700 3
Sample1 18200 18300 2
Sample1 19550 20010 2
Sample1 20500 22120 1
Sample1 25000 30000 3
Sample2 1000 15880 1
Sample2 16100 16200 1
Sample2 16800 16900 3
Sample2 18100 18200 3
Sample2 18300 18500 2
Sample2 19200 20400 1
Sample2 20500 21300 2
Sample2 22350 24800 3
Sample2 25550 26000 1
Sample2 27900 30000 3

Hi @yada !

I agree, lambda functions are not obvious concepts!

df['base'] = df.apply(lambda x: 0 if (x['width'] < 0 or pd.isna(x['base'])) else x['base'], axis=1)

is equivalent to

def my_function(x):
    if x['width'] < 0 or pd.isna(x['base']):
        return 0
    else:
        return x['base']

df['base'] = df.apply(my_function, axis=1)

and df.apply()will use this function to all rows, meaning x will be a row of df.

It should be possible to use the same figure type with your new dataset, only need to adapt the data preparation.
With this dataset, there will be gaps right? like between first ending at 15500 and second row starting at 15700, meaning a gap of 200?

1 Like

Actually it is much easier!
You can provide directly ‘Start’ as base and ‘End’ - ‘Start’ as x

fig = go.Figure()
# Create one trace for each category
for cat in sorted(df['category'].unique()):
    # filter by cat
    dff = df[df['category'] == cat]
    fig.add_bar(
        orientation='h',
        base=dff['Start'],
        x=dff['End'] - dff['Start'],
        y=dff['Name'],
        marker_color=color[cat],
        marker_line_width=0,
        name=str(cat)
    )

1 Like

Thank you for your clear explanation. Yes, there are gaps with dataset. Your below figure is exactly how i need it to be.

1 Like