Black Lives Matter. Please consider donating to Black Girls Code today.

Dash Python Multipage App: Handle large CSV file

Hi there, currently I am using a multipage page App for analytical purposes. Since the csv files for this project are very small I can simply include the import (i.e. pd.read_csv()) in the individual App file. Every time I click on this App the csv file is imported.

I want to use a similar multipage App for a different project where the files are orders of magnitude bigger. Hence this approach does not work. How can I import the csv files once when I start the App so all the individual Apps can access those pandas dataframes when the App is running, thereby avoiding the repetitive import?

I hope I made my issue clear. If I missed something in the documentation it would be enough if you point that out. But so far I did not found a solution.

Thank you very much for your help

Is the .csv file static? And is it small enough that it will fit in memory?

Yes to both questions.

In that case i would just load it into memory as a global variable when the app loads. That’s easy, fast, and as long as the data is read-only it should be OK. In terms of structure, the data could be loaded in a separate module, which is imported by all modules that need access to the data.

And where would I place the import statement? I actually tried to do this before asking the question but it did not work. Currently the import statement is inside the index.py file and I use the complete filename (to avoid any complications with the working directory). What do I have to change or add to make this work?

NameError: name 'df' is not defined

Say that you have a data.py file with contents like,

df = ...

You would then import the variable in other modules like this,

from data import df

Does that make sense? Or am i missing something in your data flow? :grin:

I see what you mean. It works.

But I don’t understand how the loading of the csv file into RAM as a pd dataframe happens only once. The command from data import df should trigger every time the df = pd.read_csv() command, or not?

Or is the data.py file executed exactly once (which resides in the same directory as the index.py file), thereby loading the data into RAM and the command from data import df just refers to this file? Currently the index.py file has no reference to the data.py file, only the individual Apps inside the Apps directory.

I am not that experienced with Python :sweat_smile:

@Emil I am working with the suggested solution from you and it works overall pretty fine, but I noticed that my functions are executed twice.

  • I have a file with functions
  • I call them in the data.py file
  • I import the resulting variables in the data.py from the individual Apps

When I start the App I can see they are run twice. First before the App loaded and right afterwards.

function_abc was called
Dash is running on http://127.0.0.1:8050/
 * Serving Flask app "app" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
function_abc was called

Do you know how this can be avoided?
Thank you