How do I overlap bar charts and set the relative position by the timestamp?

I have a series of tasks with a name, start, and end. I am trying to build a timeline of each task grouped with a single stacked bar chart, and the y axis reflecting the start and end time of it. I have a feeling there’s something obvious I’m missing here.

Take this data for example:

     Task                      Start                        End  Duration
0  Task A 2021-11-13 00:00:00.000000 2021-11-13 00:13:35.452834  0.226515
1  Task B 2021-11-13 00:13:35.452834 2021-11-13 00:36:52.699531  0.388124
2  Task C 2021-11-13 00:36:52.699531 2021-11-13 01:00:00.000000  0.385361

I can use the px.timeline to build something very close, which provides me the layout, but I want each task in a single row so when I am plotting several days worth of records it’s one row per day.

fig = px.timeline(df, x_start='Start', x_end='End', y='Task', color='Task')
fig.show()

I can build an overlapping chart here, but I can’t set the position of each start by hour.

df = pd.DataFrame(task_times, columns=['Task', 'Start', 'End', 'Duration'])
df['Hour'] = df['Start'].apply(lambda x: x.round(freq='H').strftime("%T"))

data = []
bar_width = 1.0 / task_count
for task, rows in df.groupby('Task'):
    width = 1.0 - sum([bar_width for _ in range(task_names.index(task))])
    data.append(go.Bar(
        x=rows['Hour'], y=rows['Duration'],
        width=width, name=task
    ))

fig = go.Figure(data=data, layout={"barmode": "overlay"})
fig.show()

This is the code I used to generate the sample data, and here’s an nbviewer for the full code.

task_count = 3
total_duration = 12
tasks = np.random.dirichlet(np.ones(task_count), size=12)
task_names = [f"Task {chr(97 + n).title()}" for n in range(task_count)]

task_times = []
s = datetime.combine(datetime.today(), datetime.min.time())
for task in tasks:
    for x in range(len(task)):
        t, n = task[x], task_names[x]
        e = s + timedelta(hours=t)
        task_times.append((n, s, e, t))
        s = e

df = pd.DataFrame(task_times, columns=['Task', 'Start', 'End', 'Duration'])

Seems like every time I post a question I find the solution about 20 minutes later :slight_smile:

Digging into the construction of the fig.layout and fig.data for the px.timeline output gave me some hints. The simplest way is to update the output of the px.timeline by iterating over the data and setting the y to a static value, in this case the date represented as DD-MMM-YYYY.

fig = px.timeline(df, x_start='Start', x_end='End', y='Task', color='Task')
date = s.strftime("%d-%b-%Y")
for x in fig.data:
    x.y = [date for _ in x.y]

fig.update_layout(
    title='Task Durations',
    yaxis={'title': 'Date'}
)
fig.show()

Building it from scratch requires passing more settings. The Y has to be identical for all bars, X has to be a subtraction of the start from the end as a timedelta64 with ms precision, legend and offset groups have to be set to the same name as the task itself, and the base has to be set. Additionally, the barmode needs to be set to overlay and the anchor of the xaxis needs to be y and the type as date.

Sample code is below.

data = []
date = s.strftime("%d-%b-%Y")
for task, rows in df.groupby('Task'):
    bar = go.Bar(
        x=(rows['End'] - rows['Start']).astype('timedelta64[ms]'),
        y=[date for _ in range(len(rows))],
        base=rows['Start'], name=task,
        legendgroup=task, offsetgroup=task,
        orientation='h', showlegend=True,
    )
    data.append(bar)

fig = go.Figure(data=data)
fig.update_layout(
    title='Task Durations', barmode='overlay',
    yaxis={'title': 'Date'},
    xaxis={'anchor': 'y', 'type': 'date'}
)
fig.show()

I’m sure there’s a more efficient way, but this is giving me the desired result for now. This is the end result - exactly what I was hoping for. Output in nbviewer is here for the curious.

1 Like