Performance Profiling Dash apps with Werkzeug

Hi! Just wanted to share some tips on using the Werkzeug profiler in your Dash app. I’ve found this to be really useful for finding bottlenecks in an app’s performance. Hope that this is helpful for others out there.

Install and set up profiling tool

  1. Install werkzeug
  2. Edit your index.py or app.py file to include
from werkzeug.middleware.profiler import ProfilerMiddleware

if __name__ == "__main__":
    if os.getenv("PROFILER", None):
        app.server.config["PROFILE"] = True
        app.server.wsgi_app = ProfilerMiddleware(
            app.server.wsgi_app, sort_by=("cumtime", "tottime"), restrictions=[50]
        )
    app.run_server(debug=True)

The profiler option restrictions is how many logs it will print per request. Usually ~50 will be adequate. You can also tell it to sort_by different columns, say tottime first. That provides a different perspective and it is good to find what works for you.

  1. Run the app
PROFILER="True" python app.py

or

PROFILER="True" python index.py

and you don’t have to worry about removing the profiler code when you deploy/commit.

Interpret the results

This will give you a lot of output, looking something like this…

PATH: '/_dash-update-component'
         185053 function calls (179645 primitive calls) in 0.101 seconds

   Ordered by: cumulative time, internal time
   List reduced from 1903 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.101    0.101 /Users/hannahker/Desktop/services-onboarding/services-onboarding-hannah/venv/lib/python3.8/site-packages/werkzeug/middleware/profiler.py:102(runapp)
        1    0.000    0.000    0.101    0.101 /Users/hannahker/Desktop/services-onboarding/services-onboarding-hannah/venv/lib/python3.8/site-packages/flask/app.py:2043(wsgi_app)
        1    0.000    0.000    0.100    0.100 /Users/hannahker/Desktop/services-onboarding/services-onboarding-hannah/venv/lib/python3.8/site-packages/flask/app.py:1504(full_dispatch_request)
        1    0.000    0.000    0.100    0.100 /Users/hannahker/Desktop/services-onboarding/services-onboarding-hannah/venv/lib/python3.8/site-packages/flask/app.py:1480(dispatch_request)
        1    0.000    0.000    0.100    0.100 /Users/hannahker/Desktop/services-onboarding/services-onboarding-hannah/venv/lib/python3.8/site-packages/dash/dash.py:1267(dispatch)
        1    0.000    0.000    0.100    0.100 /Users/hannahker/Desktop/services-onboarding/services-onboarding-hannah/venv/lib/python3.8/site-packages/dash/_callback.py:141(add_context)
        1    0.000    0.000    0.099    0.099 /Users/hannahker/Desktop/services-onboarding/services-onboarding-hannah/pages/home.py:18(populate_data)
        1    0.000    0.000    0.066    0.066 /Users/hannahker/Desktop/services-onboarding/services-onboarding-hannah/utils/chart_utils.py:5(multi_line_chart)
        1    0.000    0.000    0.046    0.046 /Users/hannahker/Desktop/services-onboarding/services-onboarding-hannah/venv/lib/python3.8/site-packages/plotly/subplots.py:45(make_subplots)
      634    0.000    0.000    0.043    0.000 /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/importlib/__init__.py:109(import_module)
  635/634    0.000    0.000    0.043    0.000 <frozen importlib._bootstrap>:1002(_gcd_import)
  636/635    0.001    0.000    0.042    0.000 <frozen importlib._bootstrap>:986(_find_and_load)
        9    0.000    0.000    0.039    0.004 /Users/hannahker/Desktop/services-onboarding/services-onboarding-hannah/venv/lib/python3.8/site-packages/plotly/basedatatypes.py:5055(update)
  494/327    0.000    0.000    0.038    0.000 /Users/hannahker/Desktop/services-onboarding/services-onboarding-hannah/venv/lib/python3.8/site-packages/_plotly_utils/importers.py:23(__getattr__)
    21/18    0.000    0.000    0.037    0.002 /Users/hannahker/Desktop/services-onboarding/services-onboarding-hannah/venv/lib/python3.8/site-packages/plotly/basedatatypes.py:3832(_perform_update)
  154/153    0.000    0.000    0.036    0.000 <frozen importlib._bootstrap>:956(_find_and_load_unlocked)
  141/139    0.000    0.000    0.036    0.000 /Users/hannahker/Desktop/services-onboarding/services-onboarding-hannah/venv/lib/python3.8/site-packages/plotly/basedatatypes.py:161(_check_path_in_prop_tree)
        2    0.000    0.000    0.035    0.018 /Users/hannahker/Desktop/services-onboarding/services-onboarding-hannah/venv/lib/python3.8/site-packages/plotly/basedatatypes.py:1379(update_layout)

Each section in performance logs is a single request, which maps onto a single callback execution. You can tell which callback by looking at the specific function names that were invoked.

The column meanings are as follows:

ncalls - Total number of calls made to this specific function

tottime - Total time this function (not including nested function calls) spent executing

percall - Avg time for a single evocation of this function

cumtime - Total cumulative time spent in this function including nested function calls. This is the total execute to return time * ncalls.

percall - Avg cumulative time spent in this function

filename:lineno(function) - path to source code of executed function

An interesting way to look at this is understanding the cumulative numbers guiding you to the general regions where the app is spending it’s time. Obviously the first N calls are boring, they are the Flask and Dash request and callback wrappers that stay open while all the inner functions execute and return. By definition the outer calls must run longer than any function that executes and finishes within its stack frame.

Once we get to the callback itself things get more interesting. The next few calls generally tell us which major functions take the longest. You can see if it is the function itself that is taking a long time or its children by comparing the cumtime to tottime. If tottime is basically 0 but cumulative time is much larger, then we know this function isn’t doing any real work, it is the child stacks that are doing the real work and we must keep searching deeper in the call stack. In the example above we can see that assemble_plots takes a lot of cumulative time but barely any total time. Meaning it’s not doing any real work and we’d keep searching lower in it’s stack frame. (In this example you need to look at a lot of listings to see the many Pandas calls that add up together to a larger number, this is coming from a number of processing routines inside assemble_plots, which the example above doesn’t show as I didn’t paste the full logs).

On the other hand if the tottime is nearly equivalent to the cumulative time then we know it is this function specifically that is bogged down. These are the functions we start to look hard at.

We can also explore using a tool like snakeviz to visualize these logs (as it gets big pretty fast!). You can look at icicle or sunburst graphs to understand which functions are taking up the most time. To run snakeviz, you need to point it towards the location where your prof files are output from werkzeug. You can define this manually in the code used above. You can also turn off the log streaming from your terminal if you just want to look at the results through snakeviz. For example:

PROF_DIR = 'directory_to_prof_files'

if __name__ == "__main__":
    if os.getenv("PROFILER", None):
        app.server.config["PROFILE"] = True
        app.server.wsgi_app = ProfilerMiddleware(
            app.server.wsgi_app, 
            sort_by=["cumtime"], 
            restrictions=[50],
            stream=None,
            profile_dir=PROF_DIR
        )
    app.run_server(debug=True)

In a separate terminal you’d then run (after pip installing snakeviz) to start a local server where you can interactively explore your profiling results:

snakeviz <directory_to_prof_files>

This will take you to a page like this with a directory of all the .prof files created:

Clicking on one of the files will then take you to a page with a visualization where you can explore the callstack. You can see which calls take the most time based on the proportion of the figure width that they take up.

9 Likes

Such a good tutorial, thanks for posting @hannahker!

As they say, the first step in improving the performance of software is to measure it!

3 Likes