I have a Juypter Notebook (.ipynb) file which consists of a Matplotlib barplot and histogram. The file is about 100 KB in size. The same plots done with Plotly Express are 25.5 MB in size.
The JSON export of the histogram is 3 MB while the barplot export is about 7 KB.
My questions:
-
Why is the Plotly histplot JSON more than 400 times bigger than the barplot?
-
Why is the Plotly notebook more than 8 times bigger than the two exported JSON images together?
-
Why is the Plotly notebook more than 250 times bigger than the Matplot notebook?
Hi @thorsten,
My first guess would be that you are plotting a histogram with a fairly large number of points. The thing is that when you plot a histogram via plotly, it stores all the orginal data in the json file and makes the bins and counts on the javascript side. If you want to shrink the size of you plot you can do calculate bins and counts via numpy, before you feed the data to a bar plot. Like this
import plotly.express as px
import numpy as np
df = px.data.tips()
# create the bins
counts, bins = np.histogram(df.total_bill, bins=range(0, 60, 5))
bins = 0.5 * (bins[:-1] + bins[1:])
fig = px.bar(x=bins, y=counts, labels={'x':'total_bill', 'y':'count'})
len(fig.data[0].x)
hope this helps, Alex-
1 Like
@Alexboiboi thanks, very interesting and helpful.
You are right, the dataset is pretty large (1.5 million values). Doing the counting in Python (NumPy or Pandas) decreases the notebook size from 25.5 to 3.5 MB and the time to do the βhistogramingβ from 7.5 sec to 0.5. (Matplotlib takes about 2.3 seconds to do the counting)
Nevertheless the exported Plotly (JSON) file is 3 MB in size while the notebook is 22 MB. I would expect the notebook to be JSON + JavaScript (maybe 4 MB), so that still doesnβt add up.
The reason for the size difference seems to be that Plotly minifies the JSON export (verified by removing the JavaScript code from the notebook file and minifying the rest).
Thanks again, Thorsten
1 Like
Sorry for the necropost, but this was really helpful to me, thanks @Alexboiboi !
I wanted to do something similar to your example, but also wanted to use the βcolorβ parameter of the px.histogram function, so I wrote my own function to bin the histogram manually, but plot multiple colors as a stacked bar plot. Posting it here in case it helps anyone in the future:
def plotly_histogram(df: pd.DataFrame, data_column: str, colour_column: str, nbins: int) -> go.Figure:
"""
Create a stacked histogram with Plotly using data from a DataFrame.
Args:
df (pd.DataFrame): The DataFrame containing the data.
data_column (str): The column containing the data to plot.
colour_column (str): The column used to distinguish groups for stacking.
nbins (int): Number of bins for the histogram.
Returns:
go.Figure: A Plotly Figure object representing the histogram.
"""
fig = go.Figure()
groups = df.groupby(colour_column)
# Calculate the bin width for consistency across all traces
data_bins = np.linspace(min(df[data_column]), max(df[data_column]), nbins)
width = data_bins[1] - data_bins[0]
# Loop through each group (colour)
for _, (colour, group) in enumerate(groups):
counts, bins = np.histogram(group[data_column], bins=data_bins)
# Add trace for each group with corresponding color
fig.add_trace(go.Bar(
x=bins[:-1], # Take the left bin edges for plotting
y=counts,
name=f'Group {colour}',
width=width
))
# Formatting
fig.update_layout(
barmode="stack",
title="Stacked Histogram",
xaxis_title=data_column,
yaxis_title="Frequency",
)
fig = fig.update_traces(marker_line_width=0)
return fig