High cpu in the browser and python

Hello,

I’m running a dashboard with live updates - showing 8 line charts, with refresh interval of 1 second. The browser (chrome) gets to 100% cpu (for google chrome helper) pretty much for the whole time and the python process is on about 60% the whole time. The server is preloading the data, so all it’s doing is processing the callbacks and slicing the data to be rendered.

I’m not running the server in debug mode.

Any ideas how I can reduce the cpu?

EDIT: with using Scattergl, browser cpu goes down to 50%.

Thanks

2 Likes

Can you provide a MWE?

Hi Emil, apologies for the delay. Not sure if I can attach something here, so I’ll just paste below an example. and a screenshot of what it looks like.

import random
import webbrowser

import dash
import dash_core_components as dcc
import dash_html_components as html
import plotly
import plotly.graph_objs as go
from dash.dependencies import Output, Event

# -----------------------------------------------------------------------------

queue_len = 100 # the size of the shown window
metric_names = [‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’]
lines = [‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’]

metric_to_current_index = {mn: 0 for mn in metric_names}
X = list(range(1, 100000))
Y = dict((m, dict((l, [random.random() for _ in range(100000)]) for l in lines)) for m in metric_names)

# initialise the server
app = dash.Dash(name)
app.config[‘suppress_callback_exceptions’] = True

def main():
init_layout(1000)
webbrowser.open(‘http://localhost:8050’)
app.run_server(host=‘0.0.0.0’)

def init_layout(refresh_interval):
app.layout = html.Div([
html.Div([
html.H3(‘I’),
dcc.Graph(id=‘bar_plot’, config={‘displaylogo’: False})
], className=“row”),
html.Div([
html.Div([
html.H3(‘H’),
dcc.Graph(id=‘live-graph-h’, config={‘displaylogo’: False})
], className=“six columns”),
html.Div([
html.H3(‘G’),
dcc.Graph(id=‘live-graph-g’, config={‘displaylogo’: False})
], className=“six columns”)
], className=“row”),
html.Div([
html.Div([
html.H3(‘F’),
dcc.Graph(id=‘live-graph-f’, config={‘displaylogo’: False})
], className=“six columns”),

html.Div([
html.H3(‘E’),
dcc.Graph(id=‘live-graph-e’, config={‘displaylogo’: False})
], className=“six columns”),
], className=“row”),
html.Div([
html.Div([
html.H3(‘D’),
dcc.Graph(id=‘live-graph-d’, config={‘displaylogo’: False})
], className=“six columns”),
html.Div([
html.H3(‘C’),
dcc.Graph(id=‘live-graph-c’, config={‘displaylogo’: False})
], className=“six columns”),
], className=“row”),
html.Div([
html.Div([
html.H3(‘B’),
dcc.Graph(id=‘live-graph-b’, config={‘displaylogo’: False})
], className=“six columns”),
html.Div([
html.H3(‘A’),
dcc.Graph(id=‘live-graph-a’, config={‘displaylogo’: False}),
dcc.Interval(id=‘graph-update’, interval=refresh_interval)
], className=“six columns”),
], className=“row”)
])

app.css.append_css({
‘external_url’: ‘https://codepen.io/chriddyp/pen/bWLwgP.css
})

def data_for_metric_line(mn):
current_index = 0 if mn not in metric_to_current_index else metric_to_current_index[mn]

data = []
for l in lines:
data.append(plotly.graph_objs.Scattergl(
x=X[current_index: current_index + queue_len],
y=Y[mn][l][current_index: current_index + queue_len],
name=l,
mode=‘lines+markers’))

next_index = int(current_index + queue_len / 4)
if next_index + queue_len > len(X):
next_index = max(0, len(X) - queue_len)
metric_to_current_index[mn] = next_index

return {‘data’: data, ‘layout’: go.Layout(
yaxis={‘title’: mn, ‘autorange’: True},
xaxis={‘title’: ‘Time’, ‘autorange’: True})}

def data_for_metric_bar(mn):
current_index = 0 if mn not in metric_to_current_index else metric_to_current_index[mn]
data = []
for l in lines:
data.append(go.Bar(
x=X[current_index: current_index + queue_len],
y=Y[mn][l][current_index: current_index + queue_len],
name=l
))
next_index = int(current_index + queue_len / 4)
if next_index + queue_len > len(X):
next_index = max(0, len(X) - queue_len)
metric_to_current_index[mn] = next_index

return {‘data’: data, ‘layout’: go.Layout(barmode=‘stack’,
yaxis={‘title’: mn, ‘autorange’: True},
xaxis={‘title’: ‘Time’, ‘autorange’: True})}

@app.callback(Output(‘bar_plot’, ‘figure’),
events=[Event(‘graph-update’, ‘interval’)])
def update_quantities():
return data_for_metric_bar(“i”)

@app.callback(Output(‘live-graph-h’, ‘figure’),
events=[Event(‘graph-update’, ‘interval’)])
def update_graph_scatter_h():
return data_for_metric_line(“h”)

@app.callback(Output(‘live-graph-g’, ‘figure’),
events=[Event(‘graph-update’, ‘interval’)])
def update_graph_scatter_g():
return data_for_metric_line(“g”)

@app.callback(Output(‘live-graph-f’, ‘figure’),
events=[Event(‘graph-update’, ‘interval’)])
def update_graph_scatter_f():
return data_for_metric_line(“f”)

@app.callback(Output(‘live-graph-e’, ‘figure’),
events=[Event(‘graph-update’, ‘interval’)])
def update_graph_scatter_e():
return data_for_metric_line(“e”)

@app.callback(Output(‘live-graph-d’, ‘figure’),
events=[Event(‘graph-update’, ‘interval’)])
def update_graph_scatter_d():
return data_for_metric_line(“d”)

@app.callback(Output(‘live-graph-c’, ‘figure’),
events=[Event(‘graph-update’, ‘interval’)])
def update_graph_scatter_c():
return data_for_metric_line(“c”)

@app.callback(Output(‘live-graph-b’, ‘figure’),
events=[Event(‘graph-update’, ‘interval’)])
def update_graph_scatter_b():
return data_for_metric_line(“b”)

@app.callback(Output(‘live-graph-a’, ‘figure’),
events=[Event(‘graph-update’, ‘interval’)])
def update_graph_scatter_a():
return data_for_metric_line(“a”)

# -----------------------------------------------------------------------------

if name == ‘main’:
main()

# =============================================================================

@dereal A few points:

Firstly the events argument of the app.callback is depricated, just use a standard inputs argument with n_intervals as the attribute as seen here: https://dash.plot.ly/live-updates

Secondly when you have a null callback it will fire immediately when the page starts (this will be filtered out by default in a future version of Dash but must be done manually for now), in your case this creates a bunch of extra calls that the browser never catches up on when I run your test case. This can be alleviated by making your callbacks look like this:

from dash.exceptions import PreventUpdate

And:

@app.callback(Output('bar_plot', 'figure'),
          inputs=[Input('graph-update', 'n_intervals')])
    def update_quantities(interval_count):
        if interval_count is None:
            raise PreventUpdate
        return data_for_metric_bar("i")

Thirdly you are making 9 POST requests per second, there’s simply a lot of overhead to that. When I run your code neither the web browser nor my Python process can keep up with the number of requests results in high CPU. There’s some basic options:

  • Reduce the callback frequency (I find 10 seconds works fine instead of 1 second)
  • Reduce the complexity of each call
  • Reduce the framework overhead of each call, I don’t believe Python’s http.server is the most efficient, you might want to look at how to use you can put in to a proper HTTP service like nginx or apache
  • Batch together the post calls, this could by outputting all 9 graphs each time (though this creates front-end issues), or it can be done when multi-outputs they are supported: https://github.com/plotly/dash/pull/436

FYI, I did some profiling of the server side code and most the time is spent on select.select, which makes me really think you need to use a different framework than Python’s http.server.

Finally Dash just may not be the right framework for what you’re trying to do. But good luck!

Thanks for the proposals Damian.

I’m happy to use one post for the whole thing as the data is coming up all at once. Is there a working example of doing that?

Yeah, this is what I was meaning

import random

import dash
import dash_core_components as dcc
import dash_html_components as html
import plotly
import plotly.graph_objs as go
from dash.dependencies import Output, Input
from dash.exceptions import PreventUpdate


queue_len = 100 # the size of the shown window
metric_names = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
lines = ['1', '2', '3', '4', '5', '6', '7', '8']

metric_to_current_index = {mn: 0 for mn in metric_names}
X = list(range(1, 100000))
Y = dict((m, dict((l, [random.random() for _ in range(100000)]) for l in lines)) for m in metric_names)

# initialise the server
app = dash.Dash(__name__)
app.config['suppress_callback_exceptions'] = True


def main():
    init_layout(1_000)
    app.run_server(host='0.0.0.0')


def init_layout(refresh_interval):
    app.layout = html.Div([
        dcc.Interval(id='graph-update', interval=refresh_interval),
        html.Div(id='all-graphs')
    ])

    app.css.append_css({
        'external_url': 'https://codepen.io/chriddyp/pen/bWLwgP.css'
    })


def data_for_metric_line(mn):
    current_index = 0 if mn not in metric_to_current_index else metric_to_current_index[mn]

    data = []
    for l in lines:
        data.append(plotly.graph_objs.Scattergl(
            x=X[current_index: current_index + queue_len],
            y=Y[mn][l][current_index: current_index + queue_len],
            name=l,
            mode='lines+markers'))

    next_index = int(current_index + queue_len / 4)
    if next_index + queue_len > len(X):
        next_index = max(0, len(X) - queue_len)
    metric_to_current_index[mn] = next_index

    return {'data': data, 'layout': go.Layout(
        yaxis={'title': mn, 'autorange': True},
        xaxis={'title': 'Time', 'autorange': True})}


def data_for_metric_bar(mn):
    current_index = 0 if mn not in metric_to_current_index else metric_to_current_index[mn]
    data = []
    for l in lines:
        data.append(go.Bar(
            x=X[current_index: current_index + queue_len],
            y=Y[mn][l][current_index: current_index + queue_len],
            name=l
        ))
    next_index = int(current_index + queue_len / 4)
    if next_index + queue_len > len(X):
        next_index = max(0, len(X) - queue_len)
    metric_to_current_index[mn] = next_index

    return {'data': data, 'layout': go.Layout(barmode='stack',
                                              yaxis={'title': mn, 'autorange': True},
                                              xaxis={'title': 'Time', 'autorange': True})}


@app.callback(Output('all-graphs', 'children'),
              inputs=[Input('graph-update', 'n_intervals')])
def update_all_graphs(interval_count):
    if interval_count is None:
        raise PreventUpdate

    children = [html.Div([
        html.H3('I'),
        dcc.Graph(id='bar_plot', config={'displaylogo': False}, figure=data_for_metric_bar("i"))
    ], className="row"),
    html.Div([
        html.Div([
            html.H3('H'),
            dcc.Graph(id='live-graph-h', config={'displaylogo': False}, figure=data_for_metric_line("h"))
        ], className="six columns"),
        html.Div([
            html.H3('G'),
            dcc.Graph(id='live-graph-g', config={'displaylogo': False}, figure=data_for_metric_line("g"))
        ], className="six columns")
    ], className="row"),
    html.Div([
        html.Div([
            html.H3('F'),
            dcc.Graph(id='live-graph-f', config={'displaylogo': False}, figure=data_for_metric_line("f"))
        ], className="six columns"),

        html.Div([
            html.H3('E'),
            dcc.Graph(id='live-graph-e', config={'displaylogo': False}, figure=data_for_metric_line("e"))
        ], className="six columns"),
    ], className="row"),
    html.Div([
        html.Div([
            html.H3('D'),
            dcc.Graph(id='live-graph-d', config={'displaylogo': False}, figure=data_for_metric_line("d"))
        ], className="six columns"),
        html.Div([
            html.H3('C'),
            dcc.Graph(id='live-graph-c', config={'displaylogo': False}, figure=data_for_metric_line("c"))
        ], className="six columns"),
    ], className="row"),
    html.Div([
        html.Div([
            html.H3('B'),
            dcc.Graph(id='live-graph-b', config={'displaylogo': False}, figure=data_for_metric_line("b"))
        ], className="six columns"),
        html.Div([
            html.H3('A'),
            dcc.Graph(id='live-graph-a', config={'displaylogo': False}, figure=data_for_metric_line("a"))
        ], className="six columns"),
    ], className="row")]

    return children


if __name__ == '__main__':
    main()

This solves some of your problem, in my testing the server/client are able to catch up with each other after about 10 seconds and then it runs smoothly from there on.

I think though you need to do some profiling on calling data_for_metric_line and data_for_metric_bar, if you could get these to execute faster it would make your application scale a lot better. Unfortunately from some quick testing it seems a lot of the slow down occurs in initializing the plotly object and code that runs in BasePlotlyType. So I don’t really have the expertise on how to optimize that code path.

Thanks so much for this, much appreciated!

To be honest, the CPU is still fairly high with this change, I don’t think it changes dramatically as you noticed.

Is there no way to reuse the plotly.graph_objs.Scattergl instances rather than recreating them every second? (in case the time is spent there)

Here, it seems the __init__ of all the plotly.graph_obs are very slow, I have created this hacky workaround that creates them before the app server starts and edits the properties as needed. For me the code runs 3-4x faster:

import random
from copy import deepcopy

import dash
import dash_core_components as dcc
import dash_html_components as html
import plotly.graph_objs as go
from dash.dependencies import Output, Input
from dash.exceptions import PreventUpdate


queue_len = 100 # the size of the shown window
metric_names = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
lines = ['1', '2', '3', '4', '5', '6', '7', '8']

# Cache Plotly Bar Graph Objects
plotly_bar_graph_objects = {}
plotly_bar_layout = {}
for n in metric_names:
    plotly_bar_layout[n] = go.Layout(
        barmode='stack',
        yaxis={'title': n, 'autorange': True},
        xaxis={'title': 'Time', 'autorange': True}
    )
    for l in lines:
        plotly_bar_graph_objects[n, l] = go.Bar(x=[], y=[], name=l)


# Cache Plotly Scatter Graph Objects
plotly_scatter_graph_objects = {}
plotly_scatter_layout = {}
for n in metric_names:
    plotly_scatter_layout[n] = go.Layout(
        yaxis={'title': n, 'autorange': True},
        xaxis={'title': 'Time', 'autorange': True}
    )
    for l in lines:
        plotly_scatter_graph_objects[n, l] = go.Scattergl(x=[], y=[], name=l, mode='lines+markers')


metric_to_current_index = {mn: 0 for mn in metric_names}
X = list(range(1, 100000))
Y = dict((m, dict((l, [random.random() for _ in range(100000)]) for l in lines)) for m in metric_names)

# initialise the server
app = dash.Dash(__name__)
app.config['suppress_callback_exceptions'] = True


def main():
    init_layout(1_000)
    app.run_server(host='0.0.0.0')


def init_layout(refresh_interval):
    app.layout = html.Div([
        dcc.Interval(id='graph-update', interval=refresh_interval),
        html.Div(id='all-graphs')
    ])

    app.css.append_css({
        'external_url': 'https://codepen.io/chriddyp/pen/bWLwgP.css'
    })


def data_for_metric_line(mn):
    current_index = 0 if mn not in metric_to_current_index else metric_to_current_index[mn]

    data = []
    for l in lines:
        scatter_graph = deepcopy(plotly_scatter_graph_objects[mn, l])
        scatter_graph.x = X[current_index: current_index + queue_len]
        scatter_graph.y = Y[mn][l][current_index: current_index + queue_len]
        data.append(scatter_graph)

    next_index = int(current_index + queue_len / 4)
    if next_index + queue_len > len(X):
        next_index = max(0, len(X) - queue_len)
    metric_to_current_index[mn] = next_index

    return {'data': data, 'layout': plotly_scatter_layout[mn]}


def data_for_metric_bar(mn):
    current_index = 0 if mn not in metric_to_current_index else metric_to_current_index[mn]
    data = []
    for l in lines:
        bar_chart = deepcopy(plotly_bar_graph_objects[mn, l])
        bar_chart.x = X[current_index: current_index + queue_len]
        bar_chart.y = Y[mn][l][current_index: current_index + queue_len]
        data.append(bar_chart)
    next_index = int(current_index + queue_len / 4)
    if next_index + queue_len > len(X):
        next_index = max(0, len(X) - queue_len)
    metric_to_current_index[mn] = next_index

    return {'data': data, 'layout': plotly_bar_layout[mn]}


@app.callback(Output('all-graphs', 'children'),
              inputs=[Input('graph-update', 'n_intervals')])
def update_all_graphs(interval_count):
    if interval_count is None:
        raise PreventUpdate

    children = [html.Div([
        html.H3('I'),
        dcc.Graph(id='bar_plot', config={'displaylogo': False}, figure=data_for_metric_bar("i"))
    ], className="row"),
        html.Div([
            html.Div([
                html.H3('H'),
                dcc.Graph(id='live-graph-h', config={'displaylogo': False}, figure=data_for_metric_line("h"))
            ], className="six columns"),
            html.Div([
                html.H3('G'),
                dcc.Graph(id='live-graph-g', config={'displaylogo': False}, figure=data_for_metric_line("g"))
            ], className="six columns")
        ], className="row"),
        html.Div([
            html.Div([
                html.H3('F'),
                dcc.Graph(id='live-graph-f', config={'displaylogo': False}, figure=data_for_metric_line("f"))
            ], className="six columns"),

            html.Div([
                html.H3('E'),
                dcc.Graph(id='live-graph-e', config={'displaylogo': False}, figure=data_for_metric_line("e"))
            ], className="six columns"),
        ], className="row"),
        html.Div([
            html.Div([
                html.H3('D'),
                dcc.Graph(id='live-graph-d', config={'displaylogo': False}, figure=data_for_metric_line("d"))
            ], className="six columns"),
            html.Div([
                html.H3('C'),
                dcc.Graph(id='live-graph-c', config={'displaylogo': False}, figure=data_for_metric_line("c"))
            ], className="six columns"),
        ], className="row"),
        html.Div([
            html.Div([
                html.H3('B'),
                dcc.Graph(id='live-graph-b', config={'displaylogo': False}, figure=data_for_metric_line("b"))
            ], className="six columns"),
            html.Div([
                html.H3('A'),
                dcc.Graph(id='live-graph-a', config={'displaylogo': False}, figure=data_for_metric_line("a"))
            ], className="six columns"),
        ], className="row")]

    return children


if __name__ == '__main__':
    main()

Yeah, in your first example I see the CPU high because the client and server never catch up with each other. In my first code sample the client and server catch up with each other but barely so the CPU is still high.

In this example you should see the CPU as lower on the server side at least because from my testing on my computer each call is taking 0.15 to 0.3 seconds rather than 0.6 to 0.9 seconds to process. The client side might still be high but that’s just to the sheer amount of new rendering this scenario takes.

The bottleneck in performance is now probably your actual slicing of your data, careful use of a Pandas dataframe will probably be a lot faster than slicing Python objects.

Edit: Do note that because the way these global variables are being mutated this particular solution may not be suitable for a multi-user application. But this should hopefully give you some ideas on where the slow down is occurring and how to work around it.

Edit 2: Added deepcopy for getting the graph objects, which should mean it’s safe to use in multi-user environment.

3 Likes

Damian, this is F A N T A S T I C!!! thank you so much!

With this changes the python process is about 5% cpu (was on about a 100 before), and the browser is on about 30-40%. Really much better, thanks!

Great to hear! I’ve updated the code a bit to make safe in a multi-user environment, it adds 10-15% overhead in my testing but means different that users won’t get data mixed up.

Thanks. it’s definitely good for know, but for my use case I’d rather avoid the extra 15% (python gets to ~20% on my machine) as it’s not really for a multi-user env.

Hi Damian, thanks a lot for this. It’s really helpful, however is it possible to give an example of the charts using plotly.express instead of plotly go.

Thanks a bunch again.