Weird Scatter behaviour, multiple curves drawn from one set of data

boorhin · January 23, 2023, 3:29pm

Hi,
I have been developing a python dash interface. One part is a dropdown triggering the drawing of curves on a figure with go. Scatter() Sometimes, it makes the same curve 2 times horizontally shifted.

Sometimes when I change the dropdown value, the values sometimes overlap the older ones.

I monitor with logging what is going on and the data reported by the logger show that the figure has only a single set of data.

Also one of the strange thing is that all the modes are 'lines+markers' but some parts of the graph become lines only…

A minimum working example is quite tricky without data but that’s how it looks like.

def init_farm_plot():
    '''
    Initialise the curve plot for individual farms
    '''
    fig_p=go.Figure()
    fig_p.add_trace(go.Scatter(
           x=[],
           y=[],
           mode='lines+markers',
           name='Biomass',
           yaxis='y1',          
       ))
    fig_p.add_trace(go.Scatter(
        x=[],
        y=[],
        mode='lines+markers',
        name='lice count',
        yaxis='y2',
        ))
    # trying to add empty trace for legend (didn't work)
    fig_p.add_trace(go.Scatter(x=[],y=[],mode='lines',line=dict(color='#f89406', dash='dash'),name='modelled lice infestation',
        yaxis='y2'))
    fig_p.add_trace(go.Scatter(x=[],y=[],mode='lines',line=dict(color='firebrick', dash='dash'),name='Average lice infestation',
        yaxis='y2'))
    fig_p.add_shape(type='line', xref='paper', 
        x0=0, y0=0.5, x1=1, y1=0.5,
        line=dict(color='#f89406', dash='dash'),
        yref='y2'
        )
    for y in range(2003,2022):
        fig_p.add_vline(x=datetime(year=y, month=5, day=1), line=dict(color='green', dash='dash'))
    fig_p.update_layout(
        yaxis= dict(title='Recorded fish farmbiomass (tons)',
                    showgrid=False ),
        yaxis2=dict(title='Reported average lice/fish',
                     overlaying='y', 
                     side='right',
                     showgrid=False ),
        margin=dict(b=15, l=15, r=5, t=5),
        template=template
        )
    logger.debug(fig_p)
    return fig_p
 

def mk_farm_evo(fig_p, name, times):    
    '''
    Plot individual farm biomass
    '''   
    fig_p['data'][0]= go.Scatter(x= times, 
         y=farm_data[name]['biomasses'],
         mode='lines+markers',
         name='Biomass',
         yaxis='y1')
    fig_p['data'][1]=go.Scatter(x=lice_data.time.values, 
        y=farm_data[name]['lice data'],
        mode='lines+markers',
        name='lice count',
        yaxis='y2',)

dcc.Graph(
                id='progress-curves',
                figure=init_farm_plot()
            ),

@app.callback([
    Output('progress-curves','figure'),
    Output('farm_layout','children'),
],
    Input(   'dropdown_farms', 'value',),
    State(ThemeSwitchAIO.ids.switch("theme"), "value"),
    State('progress-curves','figure'),
    log= True
)
def farm_inspector(name, toggle, fig_p, dash_logger: DashLogger):
    template = template_theme1 if toggle else template_theme2    
    logger.debug(f'curve name: {name}') 
    if not name:
        dash_logger.warning('No farm selected', autoClose=autocl)
        raise PreventUpdate
    else:
        dash_logger.info(f'Computing curve for {name}', autoClose=autocl)
        curves=mk_farm_evo(fig_p,name, times)
        curves['layout']['template']=mk_template(template)
        logger.debug(curves)
        return curves, mk_farm_layout(name, marks_biomass,marks_lice)


if __name__ == '__main__':
    app.run_server(host='0.0.0.0', port=8050, debug=True)

I am using dash 2.7.1 and plotly 5.11

boorhin · January 24, 2023, 5:14pm

I have coded a new fx so that I check that another older trace is not included by error and generate a random color for the first trace. But it is not changing this behaviour the 2 traces have the same colour. The new trace that is marker+lines comes with a former trace that plots on the same y axis. It is hard to understand because the figure is created from scratch each time and it is not stored. Also when checking the lengths of the data going in they are always the same.

I I could understand where the old trace manages to get stocked I may find how to flush it. I really am puzzled

boorhin · January 26, 2023, 10:55pm

@Emil this behaviour is somehow associated with dash-extensions and dual y axis I think.
I removed the serversideoutput and replaced things back with the standard store and the bug disappeared. This was on a production K8s cluster.

Emil · January 27, 2023, 3:21am

Are you using more than one pod? Are you using disk caching?

boorhin · January 27, 2023, 9:53am

No it is deployed with several replicas and with disc cache
Notice the cache is for the dictionary that provides the data to the curve, not for the curves

However, on my local testing, directly in python the behaviour is the same.
If I unselect and reselect the first trace, you don’t see the bug anymore as I believe the viewport of the real trace and the remanent traces are the same. If you update with a new trace you can see one or several old traces. There is also a shift which I believe are errors when resizing according to the second axes. It is strange because there is a shift in the x axis which is the same.
I have tried to catch it in multiple ways but I really have no idea what is going on as the state of the image does not show these remanent data.
Cheers

Emil · January 27, 2023, 2:09pm

Using disk cache makes the app statefull, which means you cannot use multiple replicas. If you need multiple replicas, you must use a cache external to the app itself, e.g. a shared Redis instance

boorhin · January 27, 2023, 2:48pm

Each replica has its own cache disk that I create a deployment, the bug also happens when I test it on my local machine outside K8s, straight out of plotly dash. dcc.Store doesn’t generate the bug.

jinnyzor · January 27, 2023, 3:19pm

Hello @boorhin,

The problem with disk cache and multiple deployments is that each location can only reference its own disk, this leads to the issue the @Emil was referring to.

The reason this is an issue, say #1 caters my response and saves the info I need, my next call goes to #2 which then errors because it doesnt have the saved info from the first thing in the chain.

Do you have an actual MRE that we can test out?

boorhin · January 27, 2023, 4:14pm

I don’t think that is correct, the load balancer will not send you to another replica except if you crash it. In case of crash, you will start from initialisation and the data would not be shared. I am not in total control of what is going on in K8s but that’s basically how I am trying to run it.
The problem happens also locally which I said since the beginning, so it is not a volume confusion… It doesn’t happen with dcc.Store when on k8s either. Only with serverside
I will see what I can do to reproduce the error. I indicated in my former post what it probably is plus there is a bit of code there that should confirm what I am trying to explain.
Cheers

jinnyzor · January 27, 2023, 4:17pm

The problem with trying to get it to replicate on our own, is that we have to do all of it from scratch except for the piece that you gave.

And that isnt necessarily true about load balancers, it will also send to other things if the first one is tied up. Plus, if you want it to be seamless to the client, you will still need a shared cache. This includes when a server goes down, say, for an update.

boorhin · January 27, 2023, 4:24pm

Honestly I don’t know how you would write from one container mounted volume to another. I think that would be a massive security breach.
For updates, the replica are rolled out and the new images are deployed. So there is no cache on that side, each one has its own server. They do not share cache.
as I said I will try to find the time…

jinnyzor · January 27, 2023, 4:28pm

@boorhin,

I’m confused how this is any more a security breach than caching in general?

Each cache is specifically tied to the session, therefore a shared cache is no more exposed than a siloed cache.

Of course, if someone were to access the caches themselves, then that could be problematic. But if you are concerned about that, then I’d probably use some other way to go about it. Also, on Redi’s, you can set time limits I believe for how long to stay cached.

boorhin · January 27, 2023, 4:43pm

I will try Reddis, it seems better indeed.

jinnyzor · January 27, 2023, 4:47pm

Now, as far as the other issue, haha.

Really interesting that the data shifts…

Emil · January 27, 2023, 6:58pm

If the issue persists locally (and/or with a single replicata), it would seem like a bug. In that case, as @jinnyzor notes, please post a MWE demonstrating the issue, along with version information (for dash, dash-extensions, and OS). Then I’ll take a look.

I would not expect a load balancer to necessarily route to the same replica for each request. Hence, I would not be surprised to see issues with your configuration. To ensure consistent, stable operation, I would recommend the Redis solution (or another, shared, non-ephemeral backend).

boorhin · February 9, 2023, 4:38pm

I have tried to reproduce the behaviour with a bespoke MWE without success. The bug appeared locally and in deployment with my data. Now I have totally refactored the app with Redis etc and I don’t do the serverside storage anymore. So I don’t see it. I am quite puzzled.
I have the feeling I have wasted a lot of time… My data were in dictionaries and with a lot of NaN but except that, I think it is pretty much the same as what I did with plotly dataset. If ever someone stumbles upon this, there will be a start…

plotly==5.13.0
dash==2.8.1
dash_bootstrap_templates==1.0.7
dash_bootstrap_components==1.2.1
dash_extensions==0.1.11

import plotly.graph_objects as go
from plotly.subplots import make_subplots
import dash
from dash import dcc as dcc
from dash.exceptions import PreventUpdate
import dash_bootstrap_components as dbc
from dash_extensions.enrich import Output, Input, html, State, MATCH, ALL, DashProxy, LogTransform, ServersideOutputTransform, FileSystemStore, ServersideOutput
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

def init_farm_plot():
    '''
    Initialise the curve plot for individual farms
    '''
    fig_p=go.Figure()
    fig_p.add_trace(go.Scatter(
           x=[],
           y=[],
           mode='lines+markers',
           name='Biomass',
           yaxis='y1',          
       ))
    fig_p.add_trace(go.Scatter(
        x=[],
        y=[],
        mode='lines+markers',
        name='lice count',
        yaxis='y2',
        ))
    fig_p.update_layout(
        yaxis= dict(title='Potatoes chop size',
                    showgrid=False ),
        yaxis2=dict(title='Mash volume',
                     overlaying='y', 
                     side='right',
                     showgrid=False ),
        margin=dict(b=15, l=15, r=5, t=5),
        )
    return fig_p
 

def mk_farm_evo(fig_p, name, data):    
    '''
    Plot individual farm biomass
    '''   
    fig_p['data'][0]= go.Scatter(x= data['date'].values, 
         y=data['high'].values,
         mode='lines+markers',
         name='High',
         yaxis='y1')
    fig_p['data'][1]=go.Scatter(x=data['date']+timedelta(days=1), 
        y=data['volume'].values,
        mode='lines+markers',
        name='Volume',
        yaxis='y2',)
    return fig_p

my_backend = FileSystemStore(cache_dir="/tmp")
app = DashProxy(__name__,
                #external_stylesheets=[url_theme1],#, dbc_css
                meta_tags=[{"name": "viewport", "content": "width=device-width, initial-scale=1"}],
                transforms=[ServersideOutputTransform(backend=my_backend)] #
                )
server=app.server
app.layout = dbc.Container([    
    dcc.Store(id='init', storage_type='session'),
    html.Button("Query data", id="btn"),
    dcc.Dropdown(
            	id='dropdown_farms',
            	options=[],
            	searchable=True,
            	placeholder='Select a fish farm',
            ),
    dcc.Graph(
                id='progress-curves',
                figure=init_farm_plot()
            ),
], fluid=True, className='dbc')

@app.callback(
    ServersideOutput('init', 'data'),
    Input("btn", "n_clicks")
)
def query_data(n_clicks):
    url ="https://raw.githubusercontent.com/plotly/datasets/master/all_stocks_5yr.csv"
    df=pd.read_csv(url)
    df['date']=pd.to_datetime(df.date)
    return df

@app.callback(
    Output('dropdown_farms', 'options'),
    Input('init', 'data')
)
def populate_dropdown(init):   
    return init.Name.unique()

@app.callback(
    Output('progress-curves','figure'),
    Input('dropdown_farms', 'value'),
    State('progress-curves','figure'),
    State('init','data'),
)
def farm_inspector(name, fig_p, data):
    if not name:
        raise PreventUpdate
    else:
        print (name)
        print( data.loc[data['Name']== name])
        curves=mk_farm_evo(fig_p , name, data.loc[data['Name']== name])
        return curves


if __name__ == '__main__':
    app.run_server(host='0.0.0.0', port=8017, debug=True)

Topic		Replies	Views
Dash Plot strangely creates multiple traces Dash Python	1	985	July 15, 2019
[BUG] Extra random lines in go.Scatter Dash Python question , bug-reporter	1	632	February 17, 2023
ScatterGL bug in Dash Dash Python	10	2695	February 7, 2019
Dynamically adding and removing traces to/from multiple axes Dash Python	0	4054	September 5, 2018
Plot multiples graphs with dropdown values Dash Python	5	2199	July 23, 2020

Weird Scatter behaviour, multiple curves drawn from one set of data

Related topics