Detecting changed cell in Editable Datatable

Hi,

The table of contents for the editable page of the datatable docs mentions a section on detecting which cell has changed in the datatable after an edit, but there is no such section below. Is there any example of this functionality anywhere? Is the solution just to diff the json blobs from before and after the edit?

Thanks!

1 Like

that is the recommended solution right now :+1:

1 Like

Got it, thanks for the help!

1 Like

I needed to track changes to a large DataTable so here’s what I came up with using pandas:

import dash
from dash.dependencies import Input, Output, State
import dash_table
import dash_core_components as dcc
import dash_html_components as html
from dash.exceptions import PreventUpdate

import numpy as np
import pandas as pd


def diff_dashtable(data, data_previous, row_id_name="row_id"):

    """Generate a diff of Dash DataTable data.

    Parameters
    ----------
    data: DataTable property (https://dash.plot.ly/datatable/reference)
        The contents of the table (list of dicts)
    data_previous: DataTable property
        The previous state of `data` (list of dicts).

    Returns
    -------
    A list of dictionaries in form of [{row_id_name:, column_name:, current_value:,
        previous_value:}]
    """

    df, df_previous = pd.DataFrame(data=data), pd.DataFrame(data_previous)

    for _df in [df, df_previous]:

        assert row_id_name in _df.columns

        _df = _df.set_index(row_id_name)

    mask = df.ne(df_previous)

    df_diff = df[mask].dropna(how="all", axis="columns").dropna(how="all", axis="rows")

    changes = []

    for idx, row in df_diff.iterrows():

        row_id = row.name

        row.dropna(inplace=True)

        for change in row.iteritems():

            changes.append(
                {
                    row_id_name: row_id,
                    "column_name": change[0],
                    "current_value": change[1],
                    "previous_value": df_previous.at[row_id, change[0]],
                }
            )

    return changes


app = dash.Dash(__name__)


dt_changes = []

COLUMNS = ["apple", "pear", "orange"]

N_ROWS = 1000


def create_data(columns, n_rows):

    size = n_rows * len(COLUMNS)

    data = np.random.randint(0, n_rows, size=size).reshape(n_rows, len(COLUMNS))

    df = pd.DataFrame(data=data, columns=COLUMNS)

    df.index.name = "row_id"

    return df.reset_index().to_dict(orient="records")


app.layout = html.Div(
    [
        dcc.Store(id="diff-store"),
        html.P("Changes to DataTable:"),
        html.Div(id="data-diff"),
        html.Button("Diff DataTable", id="button"),
        dash_table.DataTable(
            id="table-data-diff",
            columns=[{"id": col, "name": col, "type": "numeric"} for col in COLUMNS],
            editable=True,
            data=create_data(COLUMNS, N_ROWS),
            page_size=20,
        ),
    ]
)


@app.callback(
    Output("diff-store", "data"),
    [Input("table-data-diff", "data_timestamp")],
    [
        State("table-data-diff", "data"),
        State("table-data-diff", "data_previous"),
        State("diff-store", "data"),
    ],
)
def capture_diffs(ts, data, data_previous, diff_store_data):

    if ts is None:

        raise PreventUpdate

    diff_store_data = diff_store_data or {}

    diff_store_data[ts] = diff_dashtable(data, data_previous)

    return diff_store_data


@app.callback(
    Output("data-diff", "children"),
    [Input("button", "n_clicks")],
    [State("diff-store", "data")],
)
def update_output(n_clicks, diff_store_data):

    if n_clicks is None:

        raise PreventUpdate

    if diff_store_data:

        dt_changes = []

        for v in diff_store_data.values():

            dt_changes.append(f"* {v}")

        return [dcc.Markdown(change) for change in dt_changes]

    else:

        return "No Changes to DataTable"


if __name__ == "__main__":

    app.run_server(debug=True)
2 Likes

Above solution was very helpful but I think it missed some things (wouldn’t fire if you deleted an item since it shows up as a null value. Solution might have other troubles with blanks too?). I modified to:

def diff_dashtable(data, data_previous, row_id_name=None):
    """Generate a diff of Dash DataTable data.

    Modified from: https://community.plotly.com/t/detecting-changed-cell-in-editable-datatable/26219/2

    Parameters
    ----------
    data: DataTable property (https://dash.plot.ly/datatable/reference)
        The contents of the table (list of dicts)
    data_previous: DataTable property
        The previous state of `data` (list of dicts).

    Returns
    -------
    A list of dictionaries in form of [{row_id_name:, column_name:, current_value:,
        previous_value:}]
    """
    df, df_previous = pd.DataFrame(data=data), pd.DataFrame(data_previous)

    if row_id_name is not None:
        # If using something other than the index for row id's, set it here
        for _df in [df, df_previous]:

            # Why do this?  Guess just to be sure?
            assert row_id_name in _df.columns

            _df = _df.set_index(row_id_name)
    else:
        row_id_name = "index"

    # Pandas/Numpy says NaN != NaN, so we cannot simply compare the dataframes.  Instead we can either replace the
    # NaNs with some unique value (which is fastest for very small arrays, but doesn't scale well) or we can do
    # (from https://stackoverflow.com/a/19322739/5394584):
    # Mask of elements that have changed, as a dataframe.  Each element indicates True if df!=df_prev
    df_mask = ~((df == df_previous) | ((df != df) & (df_previous != df_previous)))

    # ...and keep only rows that include a changed value
    df_mask = df_mask.loc[df_mask.any(axis=1)]

    changes = []

    # This feels like a place I could speed this up if needed
    for idx, row in df_mask.iterrows():
        row_id = row.name

        # Act only on columns that had a change
        row = row[row.eq(True)]

        for change in row.iteritems():

            changes.append(
                {
                    row_id_name: row_id,
                    "column_name": change[0],
                    "current_value": df.at[row_id, change[0]],
                    "previous_value": df_previous.at[row_id, change[0]],
                }
            )

    return changes

Here is my clientside solution to this problem:

app.clientside_callback(
    """
    function (input,oldinput) {
        if(JSON.stringify(input) != JSON.stringify(oldinput)) {
            for (i in Object.keys(input)) {
                newArray = Object.values(input[i])
                oldArray = Object.values(oldinput[i])
                if (JSON.stringify(newArray) != JSON.stringify(oldArray)) {
                    entNew = Object.entries(input[i])
                    entOld = Object.entries(oldinput[i])
                    for (const j in entNew) {
                        if (entNew[j][1] != entOld[j][1]) {
                            changeRef = [i, entNew[j][0]] 
                            break        
                        }
                    }
                }
            }
        }
        return changeRef
    }    
    """,
    Output('output-container', 'children'),
    [Input('TableID', 'data')],
    [State('TableID', 'data_previous')]
)

changeRef[0] shows the row, and changeRef[1] shows the column name. Can easily be changed to column number by replacing to “changeRef = [i, j]”. If you want a list of multiple changes you can remove “break” and add changeRef to an array “changes” with changes.push(changeRef).

1 Like

If you make multiple changes (not all at once) to your table, would ‘oldinput’ update to the modified table after every change?

Thanks for sharing btw!

Hi,
data_previous (oldinput) is indeed updated at every change. So if you make a string of changes this callback would be triggered every time, and returns which cell has been changed at that step. Because of this it is super useful to run it clientside, which means you can run the code without any latency issues.

I tried multiple changes with your recommendations, but it still gives me one change.
Any ideas?

Found it! I did not realise that data_previous is not fixed from the start of loading the table. What i did is making a callback with a button click as input and data_previous as output. In the clientside callback I have the button click and data_previous as input, and table data as state. And I started the clientside function with an if statement: if (n_clicks > 0) to avoid error for empty data_previous.

Great function, saved me a lot of time!! There is a small bug when indexing the data frames with with row_id_name. On my system it had to be done inplace.

_df = _df.set_index(row_id_name) should be

_df = _df.set_index(row_id_name, inplace=True)

Interestingly if your data table has all numeric values, you can use this trick.
Modifying the editable data table makes the value str or string. Maybe because you don’t know whether the user will give digits or alphabets.

In my case, my data was all numeric, using the check

if isinstance(cell_value, str)

I was able to determine which row or cell was modified. I am using this in production, so if this trick will fail in future dash updates (currently using dash==2.0.0), please let me know :joy:

2 Likes

Question: how would you modify your code to render the latest updates to the same datatable id=‘table-data-diff’ either from memory or from persistence storage instead of calling create_data()