I would like to use plotly+Dash to inspect a large dataset of vectors describing some spectral data, but it does not seem like px.line is suitable for such plotting moderately large datasets. Is there anything that can be done to speed up examining large datesets to the point where callbacks will be functional, or do I need to learn a new tool? Below a toy example that generates a dataset large enough to make plotly unusable, you can set n_rows to something small to verify the script is working.
import plotly.express as px
import numpy as np
import pandas as pd
n_vars = 4000
x_range = np.linspace(0, 5*np.math.pi, n_vars)
n_rows = 2500
total_x = np.empty_like(x_range, shape = (n_rows, n_vars))
classes = np.ones_like('hello_world', shape = (n_rows))
ids = np.ones_like('hello_world', shape = (n_rows))
#could vectorize the loop below by generating all of the `choice` at once, then using `np.where`
#generating the data is plenty fast without this though
for i in range(n_rows):
choice = np.random.randint(low = 0 , high = 2, size = 1)
if choice == 1:
response = (np.random.randint(1,10) * np.random.rand()) + np.sin(x_range)
c = 'sin'
else:
response = (np.random.randint(1,10) * np.random.rand()) + np.cos(x_range)
c = 'cos'
total_x[i, :] = response
classes[i] = c
ids[i] = 'sample' + "_" + str(i)
df = pd.DataFrame(total_x)
df['class'] = classes
df['ids'] = ids
df_m = df.melt(id_vars = ['class', 'ids'], var_name = 'index', value_name = 'response' )
px.line(data_frame = df_m, x = 'index', y = 'response', line_group = 'ids', color = 'class')
It seems like the issue is ploty . I re-implemented the plot using scattergl and an option to use numpy to create the filtered vectors added with fig.add_trace() during the trace-addition. That loop was actually slightly faster using Pandas (use_numpy = False), but I do not think that is relevant. Regardless of how fast I make the filtering during the trace-addition loop, once all of the traces are added and fig.show() is called, if the plot does not include callbacks, my βfrontendβ Python is out of the picture, right? The vectors added with fig.add_trace() can be deleted or over-ridden so it seems that the data structure used to filter out each vector is not relevant. I am fairly certain this also means that vaex will not solve my problem, I tried to get it to work but my environment does not have the correct dependency structure β perhaps you suggested it because you thought the issue was caused by my PC running out of RAM? However, I am on a workstation-level PC with plenty RAM.
import plotly.express as px
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from tqdm import tqdm
import vaex
n_vars = 4000
x_range = np.linspace(0, 5*np.math.pi, n_vars)
n_rows = 5
total_x = np.empty_like(x_range, shape = (n_rows, n_vars))
classes = np.ones_like('hello_world', shape = (n_rows))
ids = np.ones_like('hello_world', shape = (n_rows))
#could vectorize the loop below by generating all of the `choice` at once, then using `np.where`
#generating the data is plenty fast without this though
for i in range(n_rows):
choice = np.random.randint(low = 0 , high = 2, size = 1)
if choice == 1:
response = (np.random.randint(1,10) * np.random.rand()) + np.sin(x_range)
c = 'sin'
else:
response = (np.random.randint(1,10) * np.random.rand()) + np.cos(x_range)
c = 'cos'
total_x[i, :] = response
classes[i] = c
ids[i] = 'sample' + "_" + str(i)
df = pd.DataFrame(total_x)
df['class'] = classes
df['ids'] = ids
df_m = df.melt(id_vars = ['class', 'ids'], var_name = 'index', value_name = 'response' )
vals = df_m.values
use_numpy = False
try:
df_v = vaex.from_pandas(df_m)
except TypeError as e:
print(f"\nVaex failed with:\n{e}\n")
fig = go.Figure()
for i in tqdm(np.unique(ids)):
if use_numpy:
plot_ar = vals[vals[:, 1] == i]
x = plot_ar[:,2]
y = plot_ar[:,3]
else:
plot_df = df_m.loc[df_m['ids'] == i, :]
x = plot_df['index'].values
y = plot_df['response'].values
fig.add_trace(
go.Scattergl(
x = x,
y = y
)
)
#delete x to demonstrate plotly does not reference the
#input objs when ploting, but instead consolidates and plots the
#the traces added to the `_data` object maintained by `BaseFigure`
del(x)
del(y)
fig.update_layout(showlegend=False)
fig.show()
What do you consider too be too slow? I just tested your code and the plot was rendered in my browser in a second. is one second too slow for you? This should be no problem at all for showing graphs through dcc.Graph. See my example.
FYI, i use plotly 5.5.0. Additional speed can be gained by installing orjson:
Thanks for your reply, the info that the code ran quickly for you was very useful. Your .gif is of just a couple traces, I am guessing that it ran reasonably well with many traces as well? My issue turned out to be that WebGL is somehow not finding my GPU (linux). Iβve been trying the solutions in some threads and blogs, but havenβt been able to resolve it yet. When I rebooted the machine into Windows, WebGL worked out of the box, and the same code was reasonably fast with 800 traces. I guess this is resolved as far as plotly goes, but Iβll update this comment once I figure out how to solve my linux blues.