Managed to reduce plot generation time for large dataset by 67% by tweaking deepcopy

jon_peter · January 27, 2021, 5:39pm

Hi all,

I wanted to share my experience finding a faster way to generate plots for large datasets. I’m by no means an expert on these things so I caution those who might try the same, I just found that it worked for my specific purposes without any outright drawbacks for me.

I have a large geojson dataset featuring some rather weird shapes that I wanted plotted using the px.Choroplethmapbox() functio. Ideally, I wanted it to be built quick- but building the plot and rendering it took a while. To save time, I generated the plot using px.choroplethmapbox, saved it as a json file, and used ujson.load along with go.Figure() to more quickly generate it. Speed gains were limited.

I ran fig.show() through a profiler and found that 30% of the time was being spent in deepcopy and deepcopy list. I read on StackOverflow that deepcopy was fairly slow, and that pickle or ujson could be used as alternatives.

To test this I went into the plotly basedatatypes.py, where I traced the calls to being made and changed the deepcopy calls to feature ujson.loads(ujson.dumps(var)).

Here are the results of my test:

Still testing, but results don’t seem to be adversely impacted by the changes so far and has helped me work much faster.

windrose · January 27, 2021, 7:23pm

How large are the datasets that you tested?

jon_peter · January 28, 2021, 1:58am

The geojson file is 250MB. Though, I’m likely to test bigger ones now. Before when I used px.choroplethmapbox() there was a 50% chance it would generate and 50% chance it’d just remain frozen.

Topic		Replies	Views
go.Figure slow with lots of data 📊 Plotly Python	8	14060	March 4, 2022
Slow Rendering Plotly Expess Choropleth Map 📊 Plotly Python question	2	346	March 2, 2024
Plotly Express Choropleth Map Animation loading extremely long 📊 Plotly Python	2	1802	March 10, 2022
Speeding up plotting large timeseries (x5) 📊 Plotly Python	0	401	November 27, 2020
Why is Plotly downloading so much data just to make some PNG files? 📊 Plotly Python	0	389	May 24, 2020

Managed to reduce plot generation time for large dataset by 67% by tweaking deepcopy

Related topics