✊🏿 Black Lives Matter. Please consider donating to Black Girls Code today.
🧬 Learn how to build RNA-Seq data apps with Python & Dash. Register for the May 20 Webinar!

Dash performance subplots/append_trace

Hi Dash community,

I would like to understand performance limitation of our current dashboard when it comes to displaying time series.
The part of the dashboard I would like to optimize looks like this :

To create these, I call two times make_voltage_subplots which takes three dataframes (L1_df, L2_df, L3_df) of shape (2788, 5) as input and displays 3 out of 5 columns in each subplot using the following code:

@decorators.display_exec_time
def make_voltage_subplots(L1_df,L2_df,L3_df,mp = '',sorting=False):
    #
    fig = make_subplots(rows=1, cols=3)
    if not L1_df.empty:
        print(L1_df.shape)
        fig = add_lines(L1_df,'L1',1,fig,sorting)
    if not L2_df.empty:
        fig = add_lines(L2_df,'L2',2,fig,sorting)
    if not L3_df.empty:
        fig = add_lines(L3_df,'L3',3,fig,sorting)
    fig.update_layout(
        title= f'''Spenningsdata per fase -- MP: {mp}''',
        yaxis={'title':'Spenning (kWh/h)','range':RANGE_VOLTAGE},
        yaxis2={'range':RANGE_VOLTAGE},
        yaxis3={'range':RANGE_VOLTAGE},
        legend=dict(orientation="h",x=0.4),
        template="plotly_white"
        )

    return fig

The add_lines just takes one dataframe as input and use append_trace to add a trace to the current subplot using ScatterGL. The function function looks like this :

@decorators.display_exec_time
def add_lines(df,str_phase,clr_line,fig,sorting):
    #
    if sorting:
        x = list(range(0,len(df.index),1))
        y_min = list(df['min'].sort_values(ascending = False))
        y_max = list(df['max'].sort_values(ascending = False))
        y_avg = list(df['average'].sort_values(ascending = False))
    else:
        #
        x = list(df.index)
        y_min = list(df['min'])
        y_max = list(df['max'])
        y_avg = list(df['average'])
    
    #min
    fig.append_trace(
        go.Scattergl(
            x = x, 
            y = y_min,
            mode = 'lines',
            name = f'{str_phase} spenning -- min',
            showlegend = False,
            line = dict(color = 'rgb(10, 10, 10)',width = 0.5)
            ),
        1,
        clr_line
        )
    #max
    fig.append_trace(
        go.Scattergl(
            x = x, 
            y = y_max,
            mode='lines',
            name=f'{str_phase} spenning -- max',
            showlegend=False,
            line=dict(color='rgb(10, 10, 10)',width = 0.5)
            ),
        1,
        clr_line
        )
    #avg
    fig.append_trace(
        go.Scattergl(
            x = x, 
            y = y_avg,
            mode='lines',
            name=f'{str_phase} spenning -- avg',
            line=dict(color=clr.DEFAULT_PLOTLY_COLORS[clr_line],width = 1)
            ),
        1,
        clr_line
        )
    
    return fig

When running the dashboard, I get the following performance (I call the display two times, see below for the end result in the dashboard) :

Execution time for 'add_lines': 1.254s
Execution time for 'add_lines': 1.299s
Execution time for 'add_lines': 1.355s
Execution time for 'make_voltage_subplots': 4.780s
Execution time for 'add_lines': 1.135s
Execution time for 'add_lines': 1.179s
Execution time for 'add_lines': 1.104s
Execution time for 'make_voltage_subplots': 3.708s

Question: what could I do differently to accelerate the creation of these subplots ?

Almost 5 seconds for the first subplot and 4s for the second (which includes an extra step of sorting the data !?) is unfortunetly not responsive enough for our current use of the dashboard.

Thanks in advance

Hi @scco, would you mind adding a few calls to time.time() within your add_lines function and print at the end of the functions the differences between them? (you will need to set debug=True to see the print output). I’m puzzled here since running a few tests showed much more reasonable numbers, see the output below for example

>>> from plotly import subplots                                                                       
>>> fig = subplots.make_subplots(1, 3)                                                                
>>> import numpy as np                                                                                
>>> x = np.random.random(3000)                                                                        
>>> %timeit -n 1 _ = fig.append_trace(go.Scattergl(x=x, y=x), 1, 1)                                   
4.98 ms ± 666 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

Thanks @Emmanuelle, I will definitely do that on Thursday when I get back to the office. Makes sense to decompose the call and see where the execution time is spent. Your test gives hope :crossed_fingers:t4:

@Emmanuelle: I have added some timing calls inside the add_lines function (step 0 is the if statment, and then the three append_trace:

This is the execution time I get :

Execution time for Step 0: 0.008s
Execution time for append_trace 1: 0.403s
Execution time for append_trace 2: 0.398s
Execution time for append_trace 3: 0.391s
Execution time for 'add_lines': 1.202s
Execution time for Step 0: 0.008s
Execution time for append_trace 1: 0.394s
Execution time for append_trace 2: 0.397s
Execution time for append_trace 3: 0.393s
Execution time for 'add_lines': 1.194s
Execution time for Step 0: 0.009s
Execution time for append_trace 1: 0.402s
Execution time for append_trace 2: 0.396s
Execution time for append_trace 3: 0.435s
Execution time for 'add_lines': 1.244s
Execution time for 'make_voltage_subplots': 4.779s

I then copy/pasted your code and got 6ms performance :thinking:
Then I understood the difference :
I did

x = list(df.index)
y_min = list(df['min'])

where you did x = np.random.random(3000)

So the solution was as “simple” as changing x = list(df.index) to x=df.index.to_numpy(dtype=object) and subsequently for all vectors that are used in append_trace.

And then magically more 10x improvement in execution time:

Execution time for Step 0: 0.003s
Execution time for append_trace 1: 0.007s
Execution time for append_trace 2: 0.006s
Execution time for append_trace 3: 0.006s
Execution time for 'add_lines': 0.023s
Execution time for Step 0: 0.002s
Execution time for append_trace 1: 0.006s
Execution time for append_trace 2: 0.007s
Execution time for append_trace 3: 0.006s
Execution time for 'add_lines': 0.024s
Execution time for Step 0: 0.003s
Execution time for append_trace 1: 0.007s
Execution time for append_trace 2: 0.006s
Execution time for append_trace 3: 0.007s
Execution time for 'add_lines': 0.025s

Merci !

Just for to complete the information for a future me in distress:

  • if you want to display floats, then use x=df[‘max’].to_numpy(dtype=np.float32) or df[‘max’].to_numpy()
  • if you have dates (as I do as df.index), then I have been using x=df.index.to_numpy(dtype=object). This is working well as I can zoom in and out in the graph and the dates scale automatically… but is slow, 10x slower then for example x=df.index.to_numpy(dtype='datetime64[ns]') that will display floats (nanoseconds). I have been skimming through Pandas/Numpy doc, but still not found a way to call to_numpy with dates that is performant.

Do you even need .to_numpy? I would’ve expected df.index to work as we do this conversion behind the scenes in our serializer: https://github.com/plotly/plotly.py/blob/4bf740b9e47affbe99002974fd5aee1432a2f2e2/packages/python/plotly/_plotly_utils/utils.py#L67-L115

If not, then perhaps that serializer needs to be updated

Yes @chriddyp, completely correct, don’t even need that :
With .to_numpy() :

Execution time for 'make_voltage_subplots': 1.163s
Execution time for 'make_voltage_subplots': 0.385s
Execution time for 'make_agg_figure': 0.040s

Without .to_numpy() :

Execution time for 'make_voltage_subplots': 0.862s
Execution time for 'make_voltage_subplots': 0.354s
Execution time for 'make_agg_figure': 0.036s

Helped a lot specially to drop dtype=object for df.index (from 1.163s to 0.862s).

The whole dashborad loads in roughly 1s now, vs. 10s before :star_struck:

@scco glad that you found a solution (and that plotly code was not the bottleneck ha ha!). Aggressive profiling is usually very useful ;-).