Crossfiltering (selectedpoints) with filtered dataframe or colored plot?

I took the generic crossfiltering recipe from Part 4. Interactive Graphing and Crossfiltering | Dash for Python Documentation | Plotly and tried it on my dashbord.
However, I can not make it work if my dataframe is pre-filtered and there is no correspondence between pointNumber and df.index anymore.
Here is a minimal working example which illustrates the problem:

import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import plotly.express as px
import numpy as np
import pandas as pd


external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
app = dash.Dash(__name__, external_stylesheets=external_stylesheets)
df = pd.read_csv('https://raw.githubusercontent.com/ChrisG60/Diams/master/diamonds.csv')


app.layout = html.Div([
    dcc.Graph(id='carat-graph', config={'displayModeBar': False}),
])


@app.callback(
    Output('carat-graph', 'figure'),
    Input('carat-graph', 'selectedData'),
)
def update_carat(selection):

    filtered_df = df.copy()

    # FIXME: Breaks crossfiltering:
    filtered_df = filtered_df[filtered_df.cut == 'Ideal']

    selectedpoints = filtered_df.index
    if selection and selection['points']:
        selectedpoints = np.intersect1d(selectedpoints, [p['customdata'] for p in selection['points']])

    fig = px.scatter(filtered_df,
                     x='carat',
                     y='price',
                     marginal_x='histogram',
                     marginal_y='histogram',
                     )

    fig.update_traces(selectedpoints=selectedpoints,
                      customdata=filtered_df.index,
                      )

    fig.update_layout(dragmode='select')
    return fig


if __name__ == '__main__':
    app.run_server(debug=True)

Everything works perfectly undless the dataframe is filtered (See line with # FIXME).
If I do this, there are two problems:

  1. the scatter plot shows a weird selection pattern on loading - usually only a few points show up and I have to double click the plot to reset it
  2. On selection, different points show up in the selection afterwards

From the documentation, I see that selectedpoints takes a list of pointNumbers but the order is messed up on dataframe filtering.
Is there any way around this issue? Why do I have to use the customdata in the first place, if this field can not be used to show the selected points anyways?

dash==1.18.1
dash-core-components==1.14.1
dash-html-components==1.1.1
dash-renderer==1.8.3
dash-table==4.11.1

I found a workaround for the issue but at the same time another problem.

While it is possible to reindex the dataframe using pandas.DataFrame.reset_index ā€” pandas 1.2.0 documentation
the whole method fails when using a color on the dataframe.

In that case, there are different curveNumbers and if you specify a list of selectedpoints it will show the selection per curve, i.e. if there are 6 curves and 4 selected points, then you will get 6*4 selected points - 4 in each curve.

See this example:

import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import plotly.express as px
import numpy as np
import pandas as pd


external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
app = dash.Dash(__name__, external_stylesheets=external_stylesheets)
df = pd.read_csv('https://raw.githubusercontent.com/ChrisG60/Diams/master/diamonds.csv')


app.layout = html.Div([
    dcc.Graph(id='carat-graph', config={'displayModeBar': False}),
])


@app.callback(
    Output('carat-graph', 'figure'),
    Input('carat-graph', 'selectedData'),
)
def update_carat(selection):

    # Reset Index is necessary to create a correspondence between index and selectedpoints
    filtered_df = df.copy().sample(100, random_state=1337).reset_index()

    selectedpoints = filtered_df.index
    if selection and selection['points']:
        selectedpoints = np.intersect1d(selectedpoints, [p['pointNumber'] for p in selection['points']])

    fig = px.scatter(filtered_df,
                     x='carat',
                     y='price',
                     color='cut',
                     marginal_x='histogram',
                     marginal_y='histogram',
                     )

    fig.update_traces(selectedpoints=selectedpoints)
    fig.update_layout(dragmode='select')

    return fig


if __name__ == '__main__':
    app.run_server(debug=True)

How can you properly do the selection in that case?
I read that plotly.graph_objects.Figure ā€” 4.14.3 documentation has some options to select the trace.
But, for each curve, the numbering starts at zero - hence there is again no correspondence between pointNumber and the index - thus I do not know which point Iā€™m looking at.
Also setting customdata does not work here, because I would need to know beforehand which points are in which trace.
How should this work? Does anyone has a hint for me?

Hi, @reox
I have run into the same issue that you are having! I was wondering did you ever find a way to work around this issue or found a solution?

Thanks, Derrick

I think the last straw was to have something that can be tracked along the brushing, such as an ID.
However, I could not figure out how that could be done reliably and thus I did not bother with it any longer - it was just a course work after all and I got a grade on it also without that :wink:
I think this is something that has to be implemented in plotly directly and is hard to work around.