Huge size of histogram and Jupyter Notebook file

I have a Juypter Notebook (.ipynb) file which consists of a Matplotlib barplot and histogram. The file is about 100 KB in size. The same plots done with Plotly Express are 25.5 MB in size.

The JSON export of the histogram is 3 MB while the barplot export is about 7 KB.

My questions:

  • Why is the Plotly histplot JSON more than 400 times bigger than the barplot?

  • Why is the Plotly notebook more than 8 times bigger than the two exported JSON images together?

  • Why is the Plotly notebook more than 250 times bigger than the Matplot notebook?

Hi @thorsten,

My first guess would be that you are plotting a histogram with a fairly large number of points. The thing is that when you plot a histogram via plotly, it stores all the orginal data in the json file and makes the bins and counts on the javascript side. If you want to shrink the size of you plot you can do calculate bins and counts via numpy, before you feed the data to a bar plot. Like this

import plotly.express as px
import numpy as np

df = px.data.tips()
# create the bins
counts, bins = np.histogram(df.total_bill, bins=range(0, 60, 5))
bins = 0.5 * (bins[:-1] + bins[1:])

fig = px.bar(x=bins, y=counts, labels={'x':'total_bill', 'y':'count'})
len(fig.data[0].x)

hope this helps, Alex-

@Alexboiboi thanks, very interesting and helpful.

You are right, the dataset is pretty large (1.5 million values). Doing the counting in Python (NumPy or Pandas) decreases the notebook size from 25.5 to 3.5 MB and the time to do the β€œhistograming” from 7.5 sec to 0.5. (Matplotlib takes about 2.3 seconds to do the counting)

Nevertheless the exported Plotly (JSON) file is 3 MB in size while the notebook is 22 MB. I would expect the notebook to be JSON + JavaScript (maybe 4 MB), so that still doesn’t add up.

The reason for the size difference seems to be that Plotly minifies the JSON export (verified by removing the JavaScript code from the notebook file and minifying the rest).

Thanks again, Thorsten