Black Lives Matter. Please consider donating to Black Girls Code today.

go.Figure slow with lots of data

Does anyone know any tricks to speed up plot generation? Specifically, figure() takes around 25 seconds to run which is fine for automatically generated plots but could be kinda slow for plots that are generate on demand. I have three traces with 302,000 points each. the x axis is a datetime while the others are float32.

Here’s what the python profiler spits out for figure:

 70876243 function calls (62706319 primitive calls) in 25.867 seconds

   Ordered by: cumulative time
   List reduced from 1855 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1    0.000    0.000   25.867   25.867 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\graph_objs\_figure.py:14(__init__)
1    0.006    0.006   25.867   25.867 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\basedatatypes.py:43(__init__)
8165169/311   11.796    0.000   25.673    0.083 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\copy.py:132(deepcopy)
  345/311    0.001    0.000   25.672    0.083 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\copy.py:236(_deepcopy_dict)
   27    0.725    0.027   25.670    0.951 {method '__deepcopy__' of 'numpy.ndarray' objects}
1    0.030    0.030   16.976   16.976 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\_plotly_utils\basevalidators.py:2242(validate_coerce)
2721459/2721444    3.796    0.000   11.562    0.000 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\copy.py:268(_reconstruct)
1    0.021    0.021    8.792    8.792 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\basedatatypes.py:148(<listcomp>)
5    0.018    0.004    8.266    1.653 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\basedatatypes.py:3414(to_plotly_json)
  8164347    1.922    0.000    7.546    0.000 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\copy.py:273(<genexpr>)
 16334987    2.211    0.000    2.211    0.000 {method 'get' of 'dict' objects}
  2721834    1.339    0.000    1.926    0.000 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\copy.py:252(_keep_alive)
 13609127    1.232    0.000    1.232    0.000 {built-in method builtins.id}
  5443528    0.693    0.000    0.693    0.000 {built-in method builtins.getattr}
  2721429    0.689    0.000    0.689    0.000 {method '__reduce_ex__' of 'datetime.datetime' objects}
  2737444    0.404    0.000    0.404    0.000 {built-in method builtins.isinstance}
  2728579    0.327    0.000    0.327    0.000 {method 'append' of 'list' objects}
  2721495    0.311    0.000    0.311    0.000 {built-in method builtins.issubclass}
  2721891    0.232    0.000    0.232    0.000 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\copy.py:190(_deepcopy_atomic)
 3245/373    0.005    0.000    0.106    0.000 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\basedatatypes.py:2745(__setitem__)
   298/38    0.001    0.000    0.097    0.003 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\_plotly_utils\basevalidators.py:2090(validate_coerce)
   297/60    0.002    0.000    0.097    0.002 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\basedatatypes.py:3067(_set_compound_prop)
   121/61    0.000    0.000    0.078    0.001 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\basedatatypes.py:3697(__setitem__)
2    0.000    0.000    0.058    0.029 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\graph_objs\_layout.py:3758(__init__)
3651/2354    0.003    0.000    0.053    0.000 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\basedatatypes.py:2811(__setattr__)
1    0.000    0.000    0.051    0.051 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\basedatatypes.py:1340(_initialize_layout_template)
27/14    0.000    0.000    0.051    0.004 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\basedatatypes.py:3723(__setattr__)
1    0.000    0.000    0.051    0.051 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\graph_objs\_layout.py:2350(template)
1    0.000    0.000    0.050    0.050 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\_plotly_utils\basevalidators.py:2344(validate_coerce)
1    0.000    0.000    0.050    0.050 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\graph_objs\layout\_template.py:188(__init__)
  339/301    0.003    0.000    0.033    0.000 <frozen importlib._bootstrap>:978(_find_and_load)
4    0.001    0.000    0.028    0.007 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\graph_objs\_scattergl.py:1535(__init__)
  525/487    0.001    0.000    0.024    0.000 <frozen importlib._bootstrap>:1009(_handle_fromlist)
 42/2    0.000    0.000    0.023    0.011 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
2    0.000    0.000    0.023    0.011 {built-in method builtins.__import__}
 40/2    0.000    0.000    0.023    0.011 <frozen importlib._bootstrap>:948(_find_and_load_unlocked)
 40/2    0.000    0.000    0.022    0.011 <frozen importlib._bootstrap>:663(_load_unlocked)
 40/2    0.000    0.000    0.022    0.011 <frozen importlib._bootstrap_external>:722(exec_module)
 40/2    0.000    0.000    0.021    0.011 {built-in method builtins.exec}
1    0.000    0.000    0.021    0.021 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\graph_objs\layout\template\_data.py:956(__init__)
1    0.000    0.000    0.019    0.019 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\validators\layout\template\data\__init__.py:1(<module>)
 2868    0.004    0.000    0.017    0.000 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\basedatatypes.py:3003(_set_prop)
2    0.000    0.000    0.012    0.006 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\plotly\graph_objs\layout\_scene.py:1428(__init__)
  299    0.001    0.000    0.012    0.000 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\site-packages\_plotly_utils\basevalidators.py:2064(data_class)
  299    0.000    0.000    0.011    0.000 C:\Users\dgoodell\AppData\Local\Programs\Python\Python37\lib\importlib\__init__.py:109(import_module)
  299    0.000    0.000    0.011    0.000 <frozen importlib._bootstrap>:994(_gcd_import)
   40    0.000    0.000    0.010    0.000 <frozen importlib._bootstrap>:882(_find_spec)
  126    0.000    0.000    0.010    0.000 <frozen importlib._bootstrap_external>:74(_path_stat)
  126    0.010    0.000    0.010    0.000 {built-in method nt.stat}
   40    0.000    0.000    0.010    0.000 <frozen importlib._bootstrap_external>:1272(find_spec)

Could you give an example code or the approach how you have done it so far?

Also at the third line it looks like a massive amount of calling the deepcopy function. Could this be the problem in your case?

Btw > 300,000 is really a lot of points. Also can you specify what figure you mean? figure from matplotlib library?

The only thing that was profiled in the above post was the call to plotly’s go.Figure().

Here’s an abridged version of the program to show the basic structure:

columns = ['datetime', 'a_gpib_alt_power_w', 'b_gpib_alt_power_w', 'ambient_tc_c']
datatypes = { 'a_gpib_alt_power_w': np.float32,  'b_gpib_alt_power_w': np.float32, 
            'ambient_tc_c': np.float32}

alldata = pandas.read_csv(filename, delimiter='\t', header=1, engine='c', 
          usecols=columns, dtype=datatypes, na_filter=False, low_memory=False, 
          parse_dates=['datetime'], infer_datetime_format=True, 
          encoding = "ISO-8859-1")
          
alldata = alldata.sort_values('datetime').reset_index(drop=True) 

trace1 = go.Scattergl(x=alldata['datetime'], y=alldata['a_gpib_alt_power_w'], 
      name='A Alt Power',  yaxis='y2', mode = 'lines+markers', 
      line = dict(width = 1, color = '#1f77b4'), 
      marker = dict(size = 2, color = '#1f77b4')) #muted blue
trace2 = go.Scattergl(x=alldata['datetime'], y=alldata['b_gpib_alt_power_w'], 
      name='B Alt Power',  yaxis='y2', mode = 'lines+markers', 
      line = dict(width = 1, color = '#17becf'), 
      marker = dict(size= 2, color = '#17becf')) #blue-teal
trace3 = go.Scattergl(x=alldata['datetime'], y=alldata['ambient_tc_c'], 
      name='Ambient Temp', yaxis='y3', mode = 'lines+markers', 
      line = dict(width = 1, color = '#ff7f0e'), 
      marker = dict(size = 2, color = '#ff7f0e')) # safety orange

data = [trace1, trace2, trace3]

layout = go.Layout(
    xaxis=dict(
        autorange=False,
        range=[alldata['datetime'].min(), alldata['datetime'].max()]
    )
#this is the function that I profiled in the original post.       
      fig = go.Figure(data=data, layout=layout) 
      plotly.offline.plot(fig, filename=convertorname+'_weekly.html', auto_open=False)

Hi @batdan,

Based on reading your code, I would have expected the plotly.offline.plot call to be taking up most of the time. In my experience the go.Figure constructor call can get slow with lots of traces, but if you have only a few traces with lots of data then the long pole is typically the call to *.plot/*.iplot. Could you time go.Figure(...) and plotly.offline.plot separately?

The main reason that plotly.offline.plot gets slow for large arrays is that the array gets serialized to a JSON list. If you’re working in the Jupyter notebook you can display the figure as a go.FigureWidget instance (https://plot.ly/python/figurewidget/), in which case the large arrays are transfered to the JavaScript library as binary buffers, which is a lot faster.

If the go.Figure call by itself is the slow part, you can bypass the validation work that the graph_objs objects do by defining you figure in terms of raw dict and list instances. Then you can set the validate=False argument to plotly.offline.plot to skip validation. Something like

trace1 = dict(x=alldata['datetime'], y=alldata['a_gpib_alt_power_w'], 
      name='A Alt Power',  yaxis='y2', mode = 'lines+markers', 
      line = dict(width = 1, color = '#1f77b4'), 
      marker = dict(size = 2, color = '#1f77b4')) #muted blue
trace2 = dict(x=alldata['datetime'], y=alldata['b_gpib_alt_power_w'], 
      name='B Alt Power',  yaxis='y2', mode = 'lines+markers', 
      line = dict(width = 1, color = '#17becf'), 
      marker = dict(size= 2, color = '#17becf')) #blue-teal
trace3 = dict(x=alldata['datetime'], y=alldata['ambient_tc_c'], 
      name='Ambient Temp', yaxis='y3', mode = 'lines+markers', 
      line = dict(width = 1, color = '#ff7f0e'), 
      marker = dict(size = 2, color = '#ff7f0e')) # safety orange

data = [trace1, trace2, trace3]

layout = dict(
    xaxis=dict(
        autorange=False,
        range=[alldata['datetime'].min(), alldata['datetime'].max()]
    )
#this is the function that I profiled in the original post.       
fig = dict(data=data, layout=layout) 
plotly.offline.plot(fig, filename=convertorname+'_weekly.html', auto_open=False, validate=False)

-Jon

I timed go.Figure() and plotly.offline.plot() separately (it looks like the python profiler was doubling the time everything took):

go.Figure() = 11.4 seconds
plotly.offline.plot() = 13.8 seconds.

I changed it just create dicts directly instead of calling go.Figure and the time to do that is <0.1seconds which is a lot better than the 11.4 second of using go.Figure()! However this makes plotly.offline.plot() take 15.8 seconds rather than 13.8 seconds. Still, it’s a large improvement.

Adding validate=false to plotly.offline.plot() has no impact on the time that it takes.

I’ll have to investigate if I can take advantage of the Jupyter Notebook features somehow to allow passing the binary data directly to the plotly plot somehow to save even more time. It’s either that or figure out how to increase the speed of creating the JSON list somehow, which I know nothing about.

I am using this to automatically generate plots every week so multiple engineers can review the operation of continuously operating long-term tests. It’s convenient that it spits out a single HTML file with all the data so people can just open a file on a network share and review the data quickly and easily. It’s too bad the data in the file isn’t compressed somehow as it’s high compressible.

Thanks for the help!

In my experience the go.Figure constructor call can get slow with lots of traces

Hm. In my case the call to go.Figure takes almost a second. It draws a stacked bar chart plot with 10 traces, each with 6 data points. I have 6 of those plots on one page. So in total loading the page takes more than 6 seconds.

Is this an expected performance and is there something I can do to optimize this?
The Plots are somewhat dynamic. On a change of a dropdown they need to be redrawn. Pre-Generating all possible Graphs would probably be an option? (100+ countries). But first I wanted to ask if there is anything else I could do…