Data overlap in chart when combining rangebreaks bounds with values

  1. When I use the following code for xaxes rangebreaks the charts shows all data in the interval Feb 13 to Feb 22 excluding the weekend Feb 18-19 and the holiday on Feb 20:

fig.update_xaxes(rangebreaks=[dict(values=[β€œ2023-02-18”, β€œ2023-02-19”, β€œ2023-02-20”])])

  1. When I also want to exclude the hours where no data exist, i.e., between 8 pm and 4 am, the data suddenly overlaps:

fig.update_xaxes(rangebreaks=[dict(bounds=[20, 4], pattern=β€œhour”), dict(values=[β€œ2023-02-18”, β€œ2023-02-19”, β€œ2023-02-20”])])

  1. If I remove the dates then the hours exclusion works, but then I have the three dates included which has no data:

fig.update_xaxes(rangebreaks=[dict(bounds=[20, 4], pattern=β€œhour”)])

How do I get (2) above to work?

Thanks for any suggestions!

Hi! Did you ever figure this out? I’m having the same issue. Thanks!

No, I very much would like to get a solution to the issue.

hi @robin1 hi @brunerm99

Can you please provide us with a minimal reproducible example of the error, together with the sample data you use?

Hi @adamschroeder,

Below is some code that generates dummy data and says there is some time gap (or holiday in the case of the stock market) which is filled with NaNs that I would like to remove from the plot. This works fine as seen in (2) but when you introduce other rangebreaks such as only using weekdays and trading hours (as my use case is for financial plotting) it results in some overlapping of the data. I will say this data looks strange because it’s not real market data but I have also attached a real-world example screenshot that should show what I’m seeing.

Minimum reproducible example:

def generate_random_data(n, mu=0.001, sigma=0.01, start_price=5):
    np.random.seed(0)
    ts = pd.date_range("2018-01-01", periods=n, freq="H")
    returns = np.random.normal(loc=mu, scale=sigma, size=n)
    close = start_price * (1 + returns).cumprod()
    return pd.DataFrame({"timestamp": ts, "close": close})

data = generate_random_data(n=500)

# Say there is a holiday here that I don't have data for
holiday = "2018-01-04"
data[pd.to_datetime(data["timestamp"].dt.date) == pd.to_datetime(holiday)] = np.nan

fig = px.line(x=data["timestamp"], y=data["close"])
fig.update_xaxes(rangebreaks=[
    dict(bounds=[20, 13.5], pattern="hour"),            # Remove non-trading hours
    dict(bounds=["sat", "mon"]),                        # Remove weekends
    dict(values=[holiday]),                             # Date I would like to remove
])
fig.update_layout(
    title=dict(
        text="Rangebreaks Issue On Time-Series Data",
        x=0.5,
    ),
    xaxis=dict(title="Time"),
    yaxis=dict(title="Price ($)"),
    font=dict(size=16), 
)
fig.show()

(1): Time series without any range breaks added (working as expected).

(2): Time series with single date removed (working as expected).

(3): Time series with date, weekends and non-trading hours removed (not working).

(4): Real example using candlestick plot (not working).

Thank you @brunerm99

Can you please share the sample data to make your code a Minimum reproducible example.

Hi @adamschroeder , the code snippet above generates some random data which I used in the screenshots (1-3). If you want the real market data I provided in (4), I can put that into a pickle file when I get home tonight and attach it here. That code snippet should be all you need to reproduce screenshots (1-3). You just need to comment/uncomment out some of these lines to generate the differences seen in (1-3).

...
fig.update_xaxes(rangebreaks=[
    dict(bounds=[20, 13.5], pattern="hour"),            # Remove non-trading hours
    dict(bounds=["sat", "mon"]),                        # Remove weekends
    dict(values=[holiday]),                             # Date I would like to remove
])
...

hi @brunerm99
You’re right, the result is not desirable.

With the current code, Jan 1-3 overlap with Jan 8-11 on the xaxis (as seen in your image as well).

I restricted the hour bounds from bounds=[20, 13.5] to bounds=[20, 24] , and that seems to work.

bounds=[13, 24] also seems to plot correctly:

But whenever the hour bound ranges from one day to the next bounds=[20, 1] or bounds=[20, 13], it doesn’t plot correctly.

At this point I’m assuming it’s a bug. Do you get a different result with other hour bounds?

@adamschroeder, it looks like changing the order and not doing multi-day bounds can work. The order is very finnicky for some reason, though.

It’s working for this code snippet and missing data at these 3 days:

rangebreaks = [
    dict(bounds=["sat", "mon"]),
    dict(values=[
        "2023-04-07", 
        "2023-04-08", 
        "2023-04-09", 
    ]),
    dict(bounds=[20, 24], pattern="hour"), 
    dict(bounds=[0, 13.5], pattern="hour"), 
]

Original without values rangebreaks for reference:

Rearranged rangebreaks array which completely breaks the plot:

rangebreaks = [
    dict(bounds=["sat", "mon"]),
    dict(bounds=[20, 24], pattern="hour"), 
    dict(bounds=[0, 13.5], pattern="hour"), 
    dict(values=[
        "2023-04-07", 
        "2023-04-08", 
        "2023-04-09", 
    ]),
]

nice workaround @brunerm99 . I’m glad you found a solution.

1 Like

Yeah, thanks for the help!