Plotly express add_trace() exhibits unexpected behavior if number of rows is greater than 1000

Pra713 · April 14, 2024, 6:36pm

Here is a script that plots scatter plots from two data frames. df_1 has n_row rows with random x and y values, while df_2 has 20 rows with random x and y values. I use plotly express (version 5.20.0) to first create a scatter plot for df_1, and then use add_trace() to add a scatter plot for df_2.

n_row = 1000
df_1 = pd.DataFrame(dict(x=np.random.rand(n_row), y=np.random.rand(n_row)))
df_2 = pd.DataFrame(dict((x=np.random.rand(20), y=np.random.rand(20)))
fig = px.scatter(df_1, 'x', 'y').update_traces(marker_color='red', marker_size=4)
fig.add_trace(px.scatter(df_2, 'x', 'y').update_traces(marker_color='blue', marker_size=20).data[0])

First, I set n_row equal to 100 and created the plot. As expected, it plots the second scatter plot (from df_2) on top of the first scatter plot (from df_1).

Next, I set n_row to 1001 and ran the same code.

n_row = 1001
df_1 = pd.DataFrame(dict(x=np.random.rand(n_row), y=np.random.rand(n_row)))
df_2 = pd.DataFrame(dict((x=np.random.rand(20), y=np.random.rand(20)))
fig = px.scatter(df_1, 'x', 'y').update_traces(marker_color='red', marker_size=4)
fig.add_trace(px.scatter(df_2, 'x', 'y').update_traces(marker_color='blue', marker_size=20).data[0])

This time, however, it plots the first scatter plot (from df_1) on top of the second scatter plot (from df_2).

I’ve tried a bunch of values for n_row. When n_row is less than or equal to 1000, the plot order is as expected (the second scatter plot is plotted on top of the first). When n_row is greater than 1000, the first scatter plot (from df_1) is plotted on top of the second (from df_2).

Is this a bug or am I missing something?

r-beginners · April 15, 2024, 3:07am

I have run your code and verified the contents. When the number of data exceeds 1000 in a scatterplot, it changes to scattergl, which is a scatterplot for large amounts of data, instead of the normal scatter, see here for more information about scattergl. My guess is that is the cause. If you rewrite each chart in a graph object, the overlap is the same even if the number of data is 1001.

import plotly.graph_objects as go
import numpy as np
import pandas as pd

n_row = 1001
df_1 = pd.DataFrame({'x': np.random.rand(n_row), 'y': np.random.rand(n_row)})
df_2 = pd.DataFrame({'x': np.random.rand(20), 'y': np.random.rand(20)})

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df_1.x,
    y=df_1.y,
    mode='markers',
    marker=dict(color='red', size=4)
))

fig.add_trace(go.Scatter(
    x=df_2.x,
    y=df_2.y,
    mode='markers',
    marker=dict(color='Blue', size=20)
))   

fig.show()

Pra713 · April 17, 2024, 4:07am

Thank you for confirming. Yes, you are likely right, this is due to the use of scattergl when the number of data exceeds 100. The script you provided works. That being said, this still feels like a bug that should be corrected.

Topic		Replies	Views
Add_trace to plotly.subplots overwrites previous traces on that same cell 📊 Plotly Python	1	265	April 26, 2021
Application Hangs when adding 40+ Traces Dash Python	0	250	August 10, 2020
Plotly express with multiple y-axis 📊 Plotly Python question , bug-reporter	1	1565	March 3, 2023
Plotting multiple traces at once 📊 Plotly Python	3	14207	August 15, 2019
Adding multiple traces at same figure at same column and row will plot different traces in same color 📊 Plotly Python	0	790	August 26, 2020

Plotly express add_trace() exhibits unexpected behavior if number of rows is greater than 1000

Related topics