Datatable: Sort and filter by value of different column or cusom formatting function

Hi all,

I have Byte value data that ranges from 0 to PB PiB, i.e. very long integers.
I want to display them in human-readable form but maintain the sorting and filtering of the Integer values.

How could this be done?
Currently, I’m displaying two columns, one with the original data for sorting and filtering and one with the converted data which I compute in the pandas DataFrame.
Also I’m using

sort_action='native',
filter_action='native' if (len(data) > 2) else 'none',
filter_options={"case": "insensitive"}

Thanks!

Edit: I’ve been looking to understand how money (dash/FormatTemplate.py at dev · plotly/dash · GitHub) works in order to eventually write something similar that does the trick, however, due to the complete lack of documentation in there and in dash/Format.py at dev · plotly/dash · GitHub especially, I have no idea how any of this works…

Hi @gothicVI

Try using dash-ag-grid. You can format numbers using d3-format , then the sort and filter will still work. It’s likely you would want to use the SI-prefix as the specifier.

Here’s and example


import dash_ag_grid as dag
from dash import Dash, html
import pandas as pd
import numpy as np

app = Dash(__name__)

df = pd.DataFrame(np.random.randint(0, 10000000000,  size=(100, 3)), columns=list("abc"))

grid = dag.AgGrid(
    rowData =  df.to_dict("records"),
    columnDefs=[{"field": i, "valueFormatter":{"function": "d3.format('(.2s')(params.value)"}} for i in "abc"],
    defaultColDef={"sortable":True,  "filter": "agNumberColumnFilter"}
)

app.layout = html.Div(grid)

if __name__ == "__main__":
     app.run_server(debug=True)



1 Like

Hi @AnnMarieW

thanks so much this is getting really close!
However, the si-unit conversion uses powers of 10; I’d need powers of 2 for conversion.
I guess I was inaccurate in my original post and should have said PiB instead of PB - sorry for that.

I’m not sure if d3 can do that automatically, but you can make a custom renderer component. Can you give an example of exactly how you want the numbers to be displayed?

1 Like

Hi @AnnMarieW

so far I had used the following function:

def convert_units(array: list[float]) -> list[str]:
    """Convert Bytes to appropriate human readable values."""
    return_list: list[str] = []
    for i in array:
        if np.isnan(i):
            return_list.append("N/A")
        elif (i > 2**50):
            return_list.append(f"{np.round(i / 2**50, decimals=2)}PB")
        elif (i > 2**40):
            return_list.append(f"{np.round(i / 2**40, decimals=2)}TB")
        elif (i > 2**30):
            return_list.append(f"{np.round(i / 2**30, decimals=2)}GB")
        elif (i > 2**20):
            return_list.append(f"{np.round(i / 2**20, decimals=2)}MB")
        elif (i > 2**10):
            return_list.append(f"{np.round(i / 2**10, decimals=2)}KB")
        else:
            return_list.append(f"{np.round(i, decimals=2)}B")
    return return_list

Hi @gothicVI

Try adding this function to the dashAgGridFunctions.js file in the /assets folder


var dagfuncs = window.dashAgGridFunctions = window.dashAgGridFunctions || {};

dagfuncs.convertUnits = function(x) {
  const units = ['B', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB'];
  if (isNaN(x)) {
    return 'N/A';
    } else {
      const unitIndex = Math.min(Math.floor(Math.log2(x) / 10), units.length - 1);
      const convertedValue = x / (2 ** (10 * unitIndex));
      return `${convertedValue.toFixed(2)}${units[unitIndex]}`;
    }
}


Then you can use it in your app like this:

import dash_ag_grid as dag
from dash import Dash, html
import pandas as pd
import numpy as np

app = Dash(__name__)

df = pd.DataFrame(np.random.randint(0, 10000000000,  size=(100, 3)), columns=list("abc"))

grid = dag.AgGrid(
    rowData =  df.to_dict("records"),
    columnDefs=[{"field": i, "valueFormatter":{"function": "convertUnits(params.value)"}} for i in "abc"],
    defaultColDef={"sortable":True,  "filter": "agNumberColumnFilter"}
)

app.layout = html.Div(grid)

if __name__ == "__main__":
     app.run_server(debug=True)


1 Like

Hi @AnnMarieW

thanks so much! By adding the missing } in the java script code here’s a fully working minimal example for everyone to see the difference:

import dash_ag_grid as dag 
from dash import Dash, html
import pandas as pd
import numpy as np

app = Dash(__name__)

df = pd.DataFrame(np.random.randint(0, 10000000000000,  size=(5000, 1)), columns=list("b"))

grid = dag.AgGrid(
    rowData =  df.to_dict("records"),
    columnDefs=[
        {"field": "b", "headerName": "Bytes"},
        {"field": "b", "headerName": "Bytes-base_10", "valueFormatter":{"function": "d3.format('(.2s')(params.value)"}},
        {"field": "b", "headerName": "Bytes-base_2", "valueFormatter":{"function": "convertUnits(params.value)"}},
    ],  
    defaultColDef={"sortable":True,  "filter": "agNumberColumnFilter"}
)

app.layout = html.Div(grid)

if __name__ == "__main__":
     app.run_server(debug=True)

Final question though: do you think it would be possible to allow for filtering?
Like, what if I’d want to see values less than 5TB? Currently, I’d need to enter the Bytes value, i.e., a huge number. Do you think that string-to-number conversion could be added, i.e., allowing to enter 5TB?

Hi @gothicVI
Sorry for the typo in the function, and I like our updated app.
For those just passing through, here’s what it looks like:

image

It is possible to allow custom number support in filters. You can find an example in the AG Grid docs here: React Data Grid: Number Filter

It’s on the list to add this example to the dash-ag-grid docs. If you get this working, I’ll add your example to the docs - it’s a great use-case :slight_smile:

1 Like

Hi @AnnMarieW

as far as I understand the following code should to the trick, or?
I have no experience w.r.t. java script at all :slight_smile:

const gridOptions = {
    columnDefs: [
        {
            field: 'age',
            filter: 'agNumberColumnFilter',
            filterParams: {
                allowedCharPattern: '\\d+(?:B|KB|MB|GB|TB|PB)?',
                numberParser: text => {
                    if (text == null) {
                        return null;
                    } else {
                        const pattern = /(^\d+)?([BKMGT]?B)$/i;
                        const match = text.match(pattern);
                        if (match) {
                            const number = parseFloat(match[1]);
                            const unit = match[2].toUpperCase();
                            const units = ['B', 'KB', 'MB', 'GB', 'TB', 'PB'];
                            const multiplier = Math.pow(1024, units.indexOf(unit));
                            return number * multiplier;
                        } else {
                            return null;
                        }
                    }
                }
            }
        }
    ],

    // other grid options ...
};

Where would this needed to be placed?

Thanks for adding this to the docs - if we get the filtering working I’d argue to add it as well as a full package :smiley:

Hello @gothicVI,

You were indeed close, this took me a while to sort out, place these in your JS file:

dagfuncs.bytes = () => {
        return {
        allowedCharPattern: '\\d(.)+\\(?:B|KB|MB|GB|TB|PB)',
        numberParser: (text) => {
            if (text == null) {
                return null;
            } else {
                const pattern = /(^[0-9.]+)?([BKMGT]?B)$/i;
                const match = text.match(pattern);
                console.log(match)
                if (match) {
                    const number = parseFloat(match[1]);
                    const unit = match[2].toUpperCase();
                    const units = ['B', 'KB', 'MB', 'GB', 'TB', 'PB'];
                    const multiplier = Math.pow(1024, units.indexOf(unit));
                    return number * multiplier;
                } else {
                    return text;
                }
            }
        }
    }
}

dagfuncs.convertUnits = function(x) {
  const units = ['B', 'KB', 'MB', 'GB', 'TB', 'PB'];
  if (isNaN(x)) {
    return null;
    } else {
      const unitIndex = Math.min(Math.floor(Math.log2(x) / 10), units.length - 1);
      const convertedValue = x / (2 ** (10 * unitIndex));
      return `${convertedValue.toFixed(2)}${units[unitIndex]}`;
    }
}

And here is the app setup:

import dash_ag_grid as dag
from dash import Dash, html
import pandas as pd
import numpy as np

app = Dash(__name__)

df = pd.DataFrame(np.random.randint(10000000,  size=(5000, 1)), columns=list("b"))

grid = dag.AgGrid(
    rowData =  df.to_dict("records"),
    columnDefs=[
        {"field": "b", "headerName": "Bytes"},
        {"field": "b", "headerName": "Bytes-base_10", "valueFormatter":{"function": "d3.format('(.2s')(params.value)"}},
        {"field": "b", "headerName": "Bytes-base_2", "valueFormatter":{"function": "convertUnits(params.value)"}},
    ],
    defaultColDef={"sortable":True,  "filter": "agNumberColumnFilter",
                   "filterParams": {'function': 'bytes()'}, 'floatingFilter': True}
)

app.layout = html.Div(grid)

if __name__ == "__main__":
     app.run_server(debug=True)

Note, I had to reduce the amount for the numpy random.

1 Like

Hi @AnnMarieW & @jinnyzor

thanks so much for your input.
I’ve tweaked the code a bit and added proper JS for the base 10 column.

One final (?) question though:
Is it possible to limit the allowed filter options? equals for instance does not make any sense because due to rounding errors it is impossible to get the correct value in bytes back. Thus limiting the options to >, <, and range is probably enough.

Edit: Forgot to add the code:
assets/dashAgGridFunctions.js:

var dagfuncs = window.dashAgGridFunctions = window.dashAgGridFunctions || {};

dagfuncs.convertUnits_2 = function(x) {
  const units = ['B', 'KB', 'MB', 'GB', 'TB', 'PB'];
  if (isNaN(x)) {
    return 'N/A';
    } else {
      const unitIndex = Math.min(Math.floor(Math.log2(x) / 10), units.length - 1);
      const convertedValue = x / (2 ** (10 * unitIndex));
      return `${convertedValue.toFixed(2)}${units[unitIndex]}`;
    }
}

dagfuncs.convertFilter_2 = () => {
        return {
        allowedCharPattern: '\\d(.)+\\(?:B|KB|MB|GB|TB|PB)',
        numberParser: (text) => {
            if (text == null) {
                return null;
            } else {
                const pattern = /(^[0-9.]+)?([KMGTP]?B)$/i;
                const match = text.match(pattern);
                console.log("convertFilter_2: " + match)
                if (match) {
                    const number = parseFloat(match[1]);
                    const unit = match[2].toUpperCase();
                    const units = ['B', 'KB', 'MB', 'GB', 'TB', 'PB'];
                    const multiplier = Math.pow(1024, units.indexOf(unit));
                    return number * multiplier;
                } else {
                    return text;
                }
            }
        }
    }
}

dagfuncs.convertFilter_10 = () => {
        return {
        allowedCharPattern: '\\d(.)+\\(?:B|KB|MB|GB|TB|PB)',
        numberParser: (text) => {
            if (text == null) {
                return null;
            } else {
                const pattern = /(^[0-9.]+)?([KMGTP]?B)$/i;
                const match = text.match(pattern);
                console.log("convertFilter_10: " + match)
                if (match) {
                    const number = parseFloat(match[1]);
                    const unit = match[2].toUpperCase();
                    const units = ['B', 'KB', 'MB', 'GB', 'TB', 'PB'];
                    const multiplier = Math.pow(1000, units.indexOf(unit));
                    return number * multiplier;
                } else {
                    return text;
                }
            }
        }
    }
}

main.py:

import dash_ag_grid as dag
from dash import Dash, html
import pandas as pd
import numpy as np

app = Dash(__name__)

df = pd.DataFrame(
    {"bytes": np.random.randint(0, 10000000000, size=5000)}
)

grid = dag.AgGrid(
    rowData=df.to_dict("records"),
    columnDefs=[
        {
            "field": "bytes",
            "headerName": "Bytes"
        },
        {
            "field": "bytes",
            "headerName": "Bytes-base_10",
            "valueFormatter": {"function": "d3.format('(.2s')(params.value)"},
            "filterParams": {'function': 'convertFilter_10()'}
        },
        {
            "field": "bytes",
            "headerName": "Bytes-base_2",
            "valueFormatter": {"function": "convertUnits_2(params.value)"},
            "filterParams": {'function': 'convertFilter_2()'}
        }
    ],
    defaultColDef={
        "sortable": True,
        "filter": "agNumberColumnFilter"
    },
    dashGridOptions={"enableCellTextSelection": True}
)

app.layout = html.Div(grid)

if __name__ == "__main__":
    app.run_server(debug=True)

Yes, you can alter the options via the filterOptions inside the filterParams. Just add it as a new key to the object you are returning from the JS function.

@jinnyzor
Thanks for the hint.
However, that somewhat does not seem to have an effect…
Also, trying to add a reset button does not always seem to work.
I’ll be back with an example later this evening.

Edit: With the following code

  • The Bytes column has the reset button but no choose able filter types
  • The Bytes-base_* columns have all the filter types but no reset button
import dash_ag_grid as dag
from dash import Dash, html
import pandas as pd
import numpy as np

app = Dash(__name__)

df = pd.DataFrame(
    {"bytes": np.random.randint(0, 10000000000, size=5000)}
)

grid = dag.AgGrid(
    rowData=df.to_dict("records"),
    columnDefs=[
        {
            "field": "bytes",
            "headerName": "Bytes",
            "filterParams": {
                'filterOptions': ['equals', 'lessThan', 'greaterThan', 'inRange'],
                'defaultOption': 'equals',
                'buttons': ['reset'],
            }
        },
        {
            "field": "bytes",
            "headerName": "Bytes-base_10",
            "valueFormatter": {"function": "d3.format('(.2s')(params.value)"},
            "filterParams": {
                'function': 'convertFilter_10()',
                'filterOptions': ['lessThan', 'greaterThan', 'inRange'],
                'defaultOption': 'inRange',
                'buttons': ['reset'],
            }
        },
        {
            "field": "bytes",
            "headerName": "Bytes-base_2",
            "valueFormatter": {"function": "convertUnits_2(params.value)"},
            "filterParams": {
                'function': 'convertFilter_2()',
                'filterOptions': ['lessThan', 'greaterThan', 'inRange'],
                'defaultOption': 'inRange',
                'buttons': ['reset'],
            }
        }
    ],
    defaultColDef={
        "sortable": True,
        "filter": "agNumberColumnFilter"
    },
    dashGridOptions={"enableCellTextSelection": True}
)

app.layout = html.Div(grid)

if __name__ == "__main__":
    app.run_server(debug=True)

I can’t see an obvious error though. Could this be due to the added JS?
But then I don’t understand how it would affect the Bytes column as we do not have a custom JS functionality here…