Managed to reduce plot generation time for large dataset by 67% by tweaking deepcopy

Hi all,

I wanted to share my experience finding a faster way to generate plots for large datasets. I’m by no means an expert on these things so I caution those who might try the same, I just found that it worked for my specific purposes without any outright drawbacks for me.

I have a large geojson dataset featuring some rather weird shapes that I wanted plotted using the px.Choroplethmapbox() functio. Ideally, I wanted it to be built quick- but building the plot and rendering it took a while. To save time, I generated the plot using px.choroplethmapbox, saved it as a json file, and used ujson.load along with go.Figure() to more quickly generate it. Speed gains were limited.

I ran fig.show() through a profiler and found that 30% of the time was being spent in deepcopy and deepcopy list. I read on StackOverflow that deepcopy was fairly slow, and that pickle or ujson could be used as alternatives.

To test this I went into the plotly basedatatypes.py, where I traced the calls to being made and changed the deepcopy calls to feature ujson.loads(ujson.dumps(var)).

Here are the results of my test:
image

Still testing, but results don’t seem to be adversely impacted by the changes so far and has helped me work much faster.

How large are the datasets that you tested?

The geojson file is 250MB. Though, I’m likely to test bigger ones now. Before when I used px.choroplethmapbox() there was a 50% chance it would generate and 50% chance it’d just remain frozen.