Optimizing writing many (hundreds+) PNGs using Kaleido

Hello, my python script writes hundreds of high resolution, multi-subplot charts to PNG using Figure.write_image (with Kaleido). cprofile of a single-process version of my script showed most time is spent in the Kaleido c++ module. Currently I execute all plotly code (including Figure.write_image) with a persistent ProcessPoolExecutor. I tested and found the optimal number of worker processes for my machine. I basically loop over my data and submit to the executor hundreds of Figure.write_image operations, which happens instantaneously, and then my CPU is nearly totally pegged as they dequeue and execute for tens of minutes.

Does anyone have any experience or recommendation for how to further optimize the performance of these write_image operations? Is there a certain way I can structure my usage of Plotly to optimize my large number of render operations, like with respect to timing maybe, or chart layout or something? Thanks for any help!!