Opening a Large html

I created a Violin plot on 6 columns of a pandas dataframe with 6M rows. I saved it as an html because it would not render in Jupyter Notebook. The resulting file is 207 MB. My questions are:

  1. is that size of file normal for such a small data set; and
  2. I have tried unsuccessfully to open the file with Chrome, Safari, Firefox, and Brave. Basically the page load seems to time out. In Safari I can see it load and then stall out. Is the challenge the browser or something to do with the file?

TIA for any guidance.

Hi! Guauuu… what large html! I’d love to try it. Is there any chance that you could share the code and data?
6M refers to thousand or million rows? I think this is worth trying with polars… what do you think?

This data set with 5 million rows of random integers is a good approximation of what I’m working with.

pd.DataFrame(np.random.randint(0,10000,size=(5066444,8))) 

When I run the following code the html file size is 193 MB. The figure also will not render in Jupyter.

import plotly.graph_objects as go

df = pd.DataFrame(np.random.randint(0,10000,size=(5066444,8)))

fig = go.Figure()

cols = df.columns 

for col in cols:
    fig.add_trace(go.Violin(y=df[col],
                            name=col,
                            box_visible=True,
                            meanline_visible=True))

fig.write_html('large.html')

Ok! Nice to try!! and I’ll definitely go with polars and numpy if the data type allows it.

  1. I tried with some list() when built the figure (in the violin y=df[col].to_list() to make it compatible with polars) and it took more than an hour and I’ve just interrupted the kernel. So,
  2. Move the .to_list() to numpy to_numpy() and changed dramatically, as much as I started to time it with different sizes of array data:

df2 = pl.DataFrame(rng.integers(0, 10000, (100444,8))) >> ##### Execution Time: 0.77 seconds

df2 = pl.DataFrame(rng.integers(0, 10000, (500444,8))) >> ##### Execution Time: 3.56 seconds

df2 = pl.DataFrame(rng.integers(0, 10000, (1000444,8))) >> ##### Execution Time: 7.84 seconds

df2 = pl.DataFrame(rng.integers(0, 10000, (5066444,8))) >> ##### Execution Time: 13.65 seconds

  1. Same as you, the last time, Jupyter didn’t render the figure. So, I only save it, just to tell that it sized around 197MB. The render in my laptop is slow, feeling the pain of render. (It’s just an Intel i5). The image is from Firefox, (large.html).

Anyway, just to tell… I’ll try Polars with more than 1 Million rows.

Code Here
import polars as pl
import random
import numpy as np
import time
import plotly.graph_objects as go

rng = np.random.default_rng(seed=42)

start_time = time.perf_counter()

df2 = pl.DataFrame(rng.integers(0, 10000, (5066444,8)))

# Initialize a figure
fig = go.Figure()

# Iterate through columns and add violin traces
for col in df2.columns:
    fig.add_trace(go.Violin(
        y=df2[col].to_numpy(),  # Convert Polars to numpy arrays
        name=col,
        box_visible=True,
        meanline_visible=True
    ))

# Show plot
# fig.show()
fig.write_html('large.html')

# End timer
end_time = time.perf_counter()
print(f"Execution Time: {end_time - start_time:.2f} seconds")
1 Like

Nice. So the question is:

Does the html file have to be that large? And if so, how can I open it? I want to be able to share my results but if the figure can’t be opened in a Jupyter notebook or an html file, then what are my options.

Btw, I use Polars for a lot of big date processing b/c Pandas is no good to me at 300 million rows and Dask gives me problems when writing to .parquet but that’s another story for another time.

This are ChatGPT 4o recommendations: (It’s up to you…)

Solutions to Open or Reduce the File Size

:one: Use a Lightweight Web Server (Avoid Browser Memory Issues)

Instead of opening it directly in a browser, serve it with a local server:

python -m http.server 8000

Then open: http://localhost:8000 in your browser. This can sometimes handle large files better than opening them directly.


:two: Reduce the File Size (Recommended)

If you still have the original dataset, regenerate the file with these optimizations:

:white_check_mark: A. Reduce Data Points (Downsampling)
Use Polars to downsample before plotting:

df_sampled = df.sample(n=100_000)  # Reduce data size before saving

Then regenerate the Plotly figure.

:white_check_mark: B. Save Without Data (include_plotlyjs="cdn" and full_html=False)
When saving with Plotly:

fig.write_html("plot.html", include_plotlyjs="cdn", full_html=False)
  • include_plotlyjs="cdn" loads the Plotly library from the web instead of embedding it (reduces size).
  • full_html=False saves only the chart, reducing HTML bloat.

:white_check_mark: C. Save as JSON Instead of HTML
If you only need the data and want to re-render the plot later:

fig.write_json("plot.json")

Then reload it later:

import plotly.io as pio
fig = pio.read_json("plot.json")
fig.show()

:three: Use an Interactive Jupyter Notebook Instead >> (I won’t go for this one :roll_eyes:…)

If you’re working in Jupyter Notebook, try displaying the plot inside the notebook instead of saving it as an HTML file:

import plotly.io as pio
pio.renderers.default = "notebook"
fig.show()

:four: Open in a Powerful Browser (Edge, Firefox, or Brave)

Some browsers (like Chrome) struggle with large HTML files. Try:

  • Edge (handles large files better)
  • Firefox (optimized memory management)
  • Brave (better handling of large data)

What Should You Do Now?

  1. :white_check_mark: Try opening it with python -m http.server 8000.
  2. :white_check_mark: If it still doesn’t open, regenerate it with full_html=False and include_plotlyjs="cdn".
  3. :white_check_mark: If it’s still too large, downsample your data (df.sample(n=100_000)) before plotting.

Maybe Option 1 and 2 could have something worth to try it.
Hopefully, it’ll help you… Anyway, I did it again, but my laptop suffer…