I have a Juypter Notebook (.ipynb) file which consists of a Matplotlib barplot and histogram. The file is about 100 KB in size. The same plots done with Plotly Express are 25.5 MB in size.
The JSON export of the histogram is 3 MB while the barplot export is about 7 KB.
My questions:
-
Why is the Plotly histplot JSON more than 400 times bigger than the barplot?
-
Why is the Plotly notebook more than 8 times bigger than the two exported JSON images together?
-
Why is the Plotly notebook more than 250 times bigger than the Matplot notebook?
Hi @thorsten,
My first guess would be that you are plotting a histogram with a fairly large number of points. The thing is that when you plot a histogram via plotly, it stores all the orginal data in the json file and makes the bins and counts on the javascript side. If you want to shrink the size of you plot you can do calculate bins and counts via numpy, before you feed the data to a bar plot. Like this
import plotly.express as px
import numpy as np
df = px.data.tips()
# create the bins
counts, bins = np.histogram(df.total_bill, bins=range(0, 60, 5))
bins = 0.5 * (bins[:-1] + bins[1:])
fig = px.bar(x=bins, y=counts, labels={'x':'total_bill', 'y':'count'})
len(fig.data[0].x)
hope this helps, Alex-
@Alexboiboi thanks, very interesting and helpful.
You are right, the dataset is pretty large (1.5 million values). Doing the counting in Python (NumPy or Pandas) decreases the notebook size from 25.5 to 3.5 MB and the time to do the βhistogramingβ from 7.5 sec to 0.5. (Matplotlib takes about 2.3 seconds to do the counting)
Nevertheless the exported Plotly (JSON) file is 3 MB in size while the notebook is 22 MB. I would expect the notebook to be JSON + JavaScript (maybe 4 MB), so that still doesnβt add up.
The reason for the size difference seems to be that Plotly minifies the JSON export (verified by removing the JavaScript code from the notebook file and minifying the rest).
Thanks again, Thorsten