How to create many pages from different datasets without a separate py file for each dataset?

Can I create many pages in my dash app based on an arbitrary number of datasets, without needing a separate py file for every one?

I am working on an app which plots datasets that are pulled from many different locations. There are multiple types of datasets, I have a layout and callbacks for each dataset type which plot the data as needed. Many different datasets of these types currently exist and more will be created in the future.

I would like the scripts to create these pages/URLs etc. without needing to copy a py file for each dataset. I suppose creating those py files could be automated in the scripts as well, but it feels unnecessary since all these datasets are so similar and the layout scripts already work on whichever one I choose provided I select the right layouts for the given dataset, and within the layout file very little would change.

I have been reading the documentation on multipage apps, and attempting various ways to accomplish this, but I feel I am missing some relatively obvious way to feed the directory name into the scripts and make a page without a new file every time, using the subdir names to generate unique URLs. My understanding of webapps is limited, so this may be a simple case of my just needing more research on the topic.

dash: 2.16.1
dcc: 2.13.1
dash html: 2/0.17
plotly: 5.19.0

For code examples, I only have simple boilerplate code as described in the documentation for multipage dash apps as I am merely trying to find the proper way to structure this project.

Thank you!

See the example by @AnnMarieW at dash-multi-page-app-demos/multi_page_basics_pathname_prefix/app.py at main Ā· AnnMarieW/dash-multi-page-app-demos Ā· GitHub

1 Like

Maybe you could do something similar to the dash leaflet docs, where you iterate over the files in the folder, and register a page for each (data) file?

1 Like

Hi @Travelmug and welcome to the Dash community :slight_smile:

It sounds like you might be able to use path variables. You can find a description in the docs here:

For each layout type, you could import the file inside the layout function based on the path variable. That way, you would only have to maintain one .py file for each layout type.

Here is an example for one of the layout types:

import dash
from dash import html

dash.register_page(__name__, path_template="/report/<report_id>")


def layout(report_id=None, **kwargs):
    df = pd.read_csv(f"assets/report_type1/{report_id}.csv")

    return html.Div(
        f"The user requested report ID: {report_id}."
    )

2 Likes

Hi @AnnMarieW thank you for the welcome, itā€™s good to be here and this is a wonderful set of tools :smiley:

I have been trying to get some of these solutions working, but I am still confused and I suspect itā€™s a conceptual misunderstanding on my part.

Within the layouts (which were written as standalone pages), I call file pre-processing functions to massage the data. After that there are multiple callbacks which rely on the output of those pre-processing function calls. I tried enclosing those function calls and callbacks into the layout function, then putting register_page using path_template with <report_id> in the layout file, and then setting the report_id as a parameter for the layout function. Then, I import the layout file in app.py, and call the layout function with the full path to the valid sub-dirs which contain the datasets as arguments in app.py.

My thinking was that by calling the layout function from app.py it would register the valid pages and then dcc.Link would link to these pages.

Using structure as described above with code like below I am getting this error:

In the callback for output(s):
xyz-results.figure
Output 0 (xyz-results.figure) is already in use.
To resolve this, set allow_duplicate=True on
duplicate outputs, or combine the outputs into
one callback function, distinguishing the trigger
by using dash.callback_context if necessary.

I tried setting prevent_initial_callbacks in app and also allow_duplicate=True in the callback.

Additionally, at one point I did have a page come up which had 2 links in it, but both had None for the suffix and produced duplicate callbacks errors even after above allow_duplicates etc. were set.

I donā€™t know if below code snippets will help (they wonā€™t run without my including other functions and data), but I hope it will make it more clear how Iā€™ve set things up and where I am might be going wrong.

Just to clarify, when it comes to the ā€˜reports/<report_id>ā€™ set in the path_template: is the reports path where my datasets would be, or my pages, something else? Iā€™m not sure exactly what this path is referring to, only that it will end up in the URL.

I will continue working with this as it may be some simple errors here on my end, but just in case I am taking the wrong approach:

APP:

> # I'd like this to be a user selected path at some point, but for now I will just get the path to the 
> # root of my current datasets:

base_path = f'{os.getcwd()}\\datasets\\'

app = Dash(__name__,
           use_pages=True,
           prevent_initial_callbacks="initial_duplicate")
 
> # Importing my layout function:
import reports.xyz_layout as xyz

> # this function gets all the sub directories of a given dir. Code from this point to 'xyz.layout(dir)' will 
> # likely be simplified later, but for now I'm just trying to figure how to call the layout function 
> # using the dir with the datasets:
files = pf.get_ds_files(base_path)
 
dir_contents = []
for sub_dir in files:
    dir_contents.append(f'{base_path}{sub_dir}')

for dir in dir_contents:
    if 'xyz' in dir.lower():
        xyz.layout(dir)
 
> # My thinking was that above would grab all the applicable 'xyz' directories and register them as 
> # pages. Then, dcc.Link below would link to each page in the page_container:
app.layout = html.Div([
    html.Div([
        html.Div(
            dcc.Link(f"{page['name']} - {page['path']}", href=page["relative_path"])
        ) for page in dash.page_registry.values()
    ]),
    dash.page_container,
 
])
 
if __name__ == '__main__':
    app.run(debug=True)

LAYOUT:

> # Here's a simplified layout, it only has a couple components but it's enough 
> # to get started:
dash.register_page(__name__, path_template="/reports/<report_id>")
 
def layout(report_id=None, **kwargs):
>   # Below gets all files within a directory:
    dataset_files = pf.get_ds_files(report_id)
>   # --- Data processing execution block --- At the end of this block we have DFs and also some 
>   # dictionaries which are used in callbacks.
    xyz_flist = pf.file_preprocess_flow_manager(dataset_files)
    xyz_dfs = pf.process_xyzf_to_df(xyz_flist)
    xyz_stats, xyz_stat_df = pf.get_xyz_stats(xyz_dfs)

    @callback(
        Output(component_id='xyz-results', component_property='figure', allow_duplicate=True),
        Input(component_id='xyz-sources', component_property='value')
    )
    @functools.lru_cache(maxsize=32)
    def update_xyz_graph(choice):
 
        color = xyz_stat_df['metric']
 
        n0 = [xyz_stat_df['a/a'], xyz_stat_df['a/b']]
        n1 = [xyz_stat_df['b/b'], xyz_stat_df['b/a']]
 
        choices = {'a': n0, 'b': n1}
 
        fig = px.bar(xyz_stat_df, x='n', y=choices[choice], title='XYZ Results', color=color)
 
        fig.update_layout(title_x=0.5)
        fig.update_traces(textfont_size=12, textangle=0, textposition="outside", cliponaxis=False)
 
        return fig

    return html.Div([

    html.Div([
 
        dcc.Graph(
            figure={},
            id='xyz-results',
            style={'width': '95vw', 'height': '50vh'},
        ),

        dcc.RadioItems(
            options=['a', 'b'],
            value='a',
            id='xyz-sources',
            inline=True
        ),

    ]),

    html.Hr(),

    ])

Thank you for the help, please let me know if I can clarify anything!

HI @Travelmug

Iā€™m guessing that you are fairly new to Dash? If so, Iā€™d recommend breaking things down into smaller pieces and get those working first. If you create a minimal example and use sample data, then if you get stuck then we would have a complete example to run. You can find more info here: How to Get your Questions Answered on the Plotly Forum

For example, you could first make a single page app that handled all the report variations for one report type. Be sure to review the this section of the docs on sharing data between callbacks.

Next step, if you are using large data (ie millions of rows) use the Background callbacks rather than functools.lru_cache

Once that is working, the next step could be turning it into a multi-page app. Note that when using pages, it will automatically import the layouts from the pages folder, and create the routing callback for you ā€œunder the hoodā€. Some of your errors are probably because your layouts are being imported twice. Note also that you canā€™t include callbacks inside the layout function. More info here: Callback Gotchas | Dash for Python Documentation | Plotly

When you get to the point of making a multi-page app, it will be helpful to clone this reop and run the examples locally so you understand how to structure a mult-page app and start with a working app frame.

I hope this helps!

1 Like

Hi @AnnMarieW, thank you again for the assistance, I am new to Dash yes.

Currently I have 4 single page apps, each of which handles a different dataset type. These SPA do work independently, and utilize background callback caching. I run them in command line using -d /ā€˜directoryā€™/ where ā€˜directoryā€™ contains the files for one dataset. I will somehow need to replace that argparse -d function with the path variables you described.

I will clone and study the repo and read more documentation. Thank you

If I understand right, what you want to do is fairly straightforward. The basic point is that you donā€™t need a pages folder to create a multi-page app - pages can instead be created by calls to register_page()

For example, this short piece of self-contained code needs no pages folder and creates a separate page for each continent in the gapminder data, each showing a simple graph of population by year for that continent.

from dash import Dash, html, dcc, register_page, page_container
import plotly.express as px

df = px.data.gapminder()
df = df.groupby(["continent","year"])["pop"].sum().reset_index()

app = Dash(__name__, use_pages=True, pages_folder="") # No pages folder

links = []
for continentname, continentdata in df.groupby("continent"):
    register_page(
        continentname, 
        layout=html.Div([
            html.H1(continentname),
            dcc.Link("Home", href="/"),
            dcc.Graph(figure=px.line(continentdata, x="year", y="pop", markers=True))
        ]), 
        path=f"/{continentname}"
    )
    links.append(html.Div(dcc.Link(continentname, href=f"/{continentname}")))

register_page("homepage", layout=html.Div(links), path="/")

app.layout = html.Div([html.H1("Demo of multiple dynamically created pages"), 
                       page_container])

if __name__ == "__main__":
    app.run(debug=False, host='0.0.0.0')  

2 Likes

Thank you @davidharris, this does look promising. Iā€™ve been working with it for a bit now trying to add callbacks, I have something working below. My target list of pages would be something like this:

Page 1 - Layout W (callback ABC, dataset 0)
Page 2 - Layout X (callback DEF, dataset 1)
Page 3 - Layout Y (callback GHI, dataset 2)
Page 4 - Layout Z (callback JKL, dataset 3)
Page 5 - Layout W (callback ABC, dataset 4)
Page 6 - Layout X (callback DEF, dataset 5)
Page 7 - Layout Y (callback GHI, dataset 6)
...

Layouts and callbacks repeat, but pages and datasets do not. My current single page apps look like this:

* imports
* argparse function # get directory name for dataset
* process data
* app init
* layout
* callbacks
* app.run

If Iā€™m understanding things right, callback ids need to be unique across pages. My thinking is I should turn the above SPA structure into a function which takes in the directory_name and a value for the callback input and output ids to make them unique across pages. Within the function the directory_name would be used to pull the data and process it.

Here is a version of what you provided which does most of what I described above:

from dash import Dash, html, dcc, register_page, page_container, callback, Output, Input
import plotly.express as px

df1 = px.data.gapminder()
df1 = df1.groupby(["continent", "year"])["pop"].sum().reset_index()

df2 = px.data.gapminder()
df2 = df2.groupby(["year", "continent"])["gdpPercap"].sum().reset_index()

app = Dash(__name__, use_pages=True, pages_folder="")  # No pages folder

id_count = 0

def continent_layout(continentname, continentdata, x=None, y=None, id_num=None):
    layout = html.Div([
            html.H1(continentname),
            dcc.Link("Home", href="/"),
            dcc.RadioItems(
                options=[col for col in continentdata.columns],
                value='continent',
                id=f'input_color_{id_num}',
                inline=True
            ),
            dcc.Graph(
                figure={},
                id=f'output_color_{id_num}'),
        ])

    @callback(
        Output(component_id=f'output_color_{id_num}', component_property='figure'),
        Input(component_id=f'input_color_{id_num}', component_property='value')
    )
    def update_color(chosen_color):
        fig = px.bar(continentdata, x=x, y=y, color=chosen_color)
        return fig

    return layout

links = []

for continentname, continentdata in df1.groupby("continent"):
    register_page(
        f'{continentname}_pop',

        layout=continent_layout(continentname, continentdata,
                                x='year',
                                y='pop',
                                id_num=id_count),

        path=f"/{continentname}_pop"
    )
    links.append(html.Div(dcc.Link(f'{continentname}_pop', href=f"/{continentname}_pop")))
    id_count += 1

for continentname, continentdata in df2.groupby("continent"):
    register_page(
        f'{continentname}_gdpPercap',

        layout=continent_layout(continentname, continentdata,
                                x='year',
                                y='gdpPercap',
                                id_num=id_count),

        path=f"/{continentname}_gdpPercap"
    )
    links.append(html.Div(dcc.Link(f'{continentname}_gdpPercap', href=f"/{continentname}_gdpPercap")))
    id_count += 1

register_page("homepage", layout=html.Div(links), path="/")

app.layout = html.Div([html.H1("Demo of multiple dynamically created pages"),
                       page_container])

if __name__ == "__main__":
    app.run(debug=True)

This looks like it is heading in the right direction for my use-case, but Iā€™m unsure of best practices in this scenario and what pitfalls may be around the corner.

Any additional feedback would be greatly appreciated, but youā€™ve all been so helpful already. Thank you so much!

Hi @Travelmug

Looks like you are making progress. Pattern matching callback might also be an option.

More info in the dash docs:

You can find a couple live examples here:

Thank you @AnnMarieW, there are a couple places in my case where pattern-matching callbacks would be useful :slight_smile: