Plotly express add_trace() exhibits unexpected behavior if number of rows is greater than 1000

Here is a script that plots scatter plots from two data frames. df_1 has n_row rows with random x and y values, while df_2 has 20 rows with random x and y values. I use plotly express (version 5.20.0) to first create a scatter plot for df_1, and then use add_trace() to add a scatter plot for df_2.

n_row = 1000
df_1 = pd.DataFrame(dict(x=np.random.rand(n_row), y=np.random.rand(n_row)))
df_2 = pd.DataFrame(dict((x=np.random.rand(20), y=np.random.rand(20)))
fig = px.scatter(df_1, 'x', 'y').update_traces(marker_color='red', marker_size=4)
fig.add_trace(px.scatter(df_2, 'x', 'y').update_traces(marker_color='blue', marker_size=20).data[0])

First, I set n_row equal to 100 and created the plot. As expected, it plots the second scatter plot (from df_2) on top of the first scatter plot (from df_1).

Next, I set n_row to 1001 and ran the same code.

n_row = 1001
df_1 = pd.DataFrame(dict(x=np.random.rand(n_row), y=np.random.rand(n_row)))
df_2 = pd.DataFrame(dict((x=np.random.rand(20), y=np.random.rand(20)))
fig = px.scatter(df_1, 'x', 'y').update_traces(marker_color='red', marker_size=4)
fig.add_trace(px.scatter(df_2, 'x', 'y').update_traces(marker_color='blue', marker_size=20).data[0])

This time, however, it plots the first scatter plot (from df_1) on top of the second scatter plot (from df_2).

I’ve tried a bunch of values for n_row. When n_row is less than or equal to 1000, the plot order is as expected (the second scatter plot is plotted on top of the first). When n_row is greater than 1000, the first scatter plot (from df_1) is plotted on top of the second (from df_2).

Is this a bug or am I missing something?

I have run your code and verified the contents. When the number of data exceeds 1000 in a scatterplot, it changes to scattergl, which is a scatterplot for large amounts of data, instead of the normal scatter, see here for more information about scattergl. My guess is that is the cause. If you rewrite each chart in a graph object, the overlap is the same even if the number of data is 1001.

import plotly.graph_objects as go
import numpy as np
import pandas as pd

n_row = 1001
df_1 = pd.DataFrame({'x': np.random.rand(n_row), 'y': np.random.rand(n_row)})
df_2 = pd.DataFrame({'x': np.random.rand(20), 'y': np.random.rand(20)})

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df_1.x,
    y=df_1.y,
    mode='markers',
    marker=dict(color='red', size=4)
))

fig.add_trace(go.Scatter(
    x=df_2.x,
    y=df_2.y,
    mode='markers',
    marker=dict(color='Blue', size=20)
))   

fig.show()

Thank you for confirming. Yes, you are likely right, this is due to the use of scattergl when the number of data exceeds 100. The script you provided works. That being said, this still feels like a bug that should be corrected.