I am experiencing slow rendering and load times with Scattergl
for my dataset (which is quite large). So I figured I would try to use Pointcloud. But when I try to use my dataframe’s Date
column as the x-axis it does not work.
My DataFrame is constructed like this:
timestamp | Date | ID | data |
---|---|---|---|
1541516362 | 2018-11-06 15:59:22+01:00 | 51 | 0 |
1541527532 | 2018-11-06 19:05:32+01:00 | 76 | -41 |
⋮ | ⋮ | ⋮ | ⋮ |
1557496108 | 2019-05-10 15:48:28+02:00 | 71 | -15 |
where timestamp is a UTC UNIX timestamp, and the Date column is added by me after creating the DataFrame by reading from a database, like so:
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import pandas as pd
import plotly.graph_objs as go
[...] # Code that reads database into dataframe etc.
tz = "Europe/Oslo"
df["Date"] = pd.to_datetime(df["timestamp"], unit="s", utc=True
df.Date = df.Date.dt.tz_convert(tz)
I have a html-div in the app layout on the form:
html.Div(
id="mhp-graph-div",
children=dcc.Graph(
style={"bgcolor": "rgba(0,0,0,0)"},
id="mhp-graph",
config={"scrollZoom": True}
),
)
which I update using another html-div with a dcc.Dropdown
like so:
@app.callback(Output("mhp-graph", "figure"),
[Input("live-dropdown", "value")])
def update_live_graph(value):
fig = {
"data": [
go.Pointcloud(
x=df[df["ID"] == value]["Date"],
y=df[df["ID"] == value]["data"],
marker={"sizemin": 0.5, "sizemax": 100},
)
]
}
return fig
However, when I run this code, nothing gets plotted, and the date axis is going from early 2000 to early 2001 with no data points plotted. But if I try to plot using the timestamp
column as x-axis, there’s no problems, it works like a charm. The dataframe is sorted by rising timestamp/date, and the first and last in the table I included above is the first and last in the dataset (late 2018 to april this may this year).
Anyone that can help with this issue?
EDIT: I also just noticed that loading the graph with Scattergl
with the Date
column takes approximately 7 times longer than loading it with the timestamp
column. Is there really such a performance-difference with dates vs uint32? And is there a way to get around this?
Image of the plot when using Date
column
Image of the plot when using timestamp
column