Deploying your Dash app to Heroku - THE MAGICAL GUIDE

Hi Dan,

Great post! I followed this along with Charming Data on youtube to create my web app. Now I’m curious what I can do to have a script running in the background apart from the web app and still be able to access the data that my scrip runs. Here is my take on this:

Procfile:

web: gunicorn app:server
worker: python3 main.py

My web app, “app” basically will get data from a ‘text.csv’ generate from ‘main.py’ (worker).

My ‘main.py’ (worker) basically run a crontab task that runs each day at 9am to save data to the ‘text.csv’

The problem is when I deploy this, the webapp and the worker are running on different server? Thus, the webapp and the worker both have different directory. Do you know what I can do so that my worker (‘main.py’) can write to a ‘text.csv’ and then my webapp can retreive the data from ‘text.csv’?

Any tips on what I can do would be awesome!

Thanks!

1 Like

Hi, glad my article helped get you up and running.

It sounds to me like you want to perform a regular scheduled refresh of some data (i.e. your worker process to produce a .csv) and then you want your front end dash app (web process) to be able to serve this to the client browser. And you just want a basic solution.

The problem is these processes are completely independent (as you pointed out) and can be thought of as running on independent, isolated virtual machines. So there is no way to persist data between them. Some Heroku docs here that are a similar use case you may have found.

Solution Option
What you need to do is push the .csv file generated by your worker to a centralised location that both processes can access. Something like an AWS S3 or Azure Blob store would probably do the trick? (you would need a paid subscription but it would be super cheap if you just want small volume and low bandwidth) You may also be able to push the .csv to github raw user content as a public repo. In either case you will need to authenticate from your worker process to connect to those services and that might be the tricky part. You can definitely connect to S3 and Azure Blob from Python (allowing you to read and write files). You would need to import the modules and do some reading into how to perform these actions from your python code. I’m most familiar with Azure, docs here for how to use their SDK.

It’s probably not that robust, but if you can just overwrite the same file each time you up date it (or timestamp .csv files) then it should work. I’d first check that I can read the .csv file from the cloud store in the web app as a test, and once that’s working, turn attention to updating it via the worker. There are probably better approaches but that’s how I’d start :slight_smile:

2 Likes

Good summary @dan_baker !

Also FYI to our Dash Enterprise customers who might be reading this - In a forthcoming release of Dash Enterprise, we’re adding support for shared, persistent filesystems between processes in the Procfile. There will be a special directory that you’ll be able to write to that is shared between in process container listed in Procfile and also shared with the workspace (the onboard IDE) so that you can inspect the file and even edit it manually.

2 Likes

Hi Dan,

I’m relatively new to coding so your explanation and the technical terms you used are very useful/helpful to me as I’m building up my coding skill. I’ll look into your suggestion - you are right that the setting up the connection to read/write data is tricky part, more so on new coder like me :rofl:.

The other option I have been looking into is the Heroku Database and this is a lot of learning curve.

Thanks again for your thorough explanation. :grinning::+1:t4:

@ngaunguyens if you are deploying worker processes and using cronjobs I’d say you are doing pretty well for a new coder :slight_smile:

I’ve provided a little extra detail below on exactly how I would approach your problem, and kept it in this thread in case others encounter a similar challenge with accessing data between workers and web processes. Using Azure as an example.

Understanding cloud storage
This is the tricky part if you’ve never used AWS S3 or Azure Blob cloud stores before. It’s quite a bit to take in but worth the time to learn. You can think of them like a google drive or drop box for files, but they are specially designed to be connected to from code, such as your Python app. And they can store any type of file, providing super fast access to it from incoming TCP connections (i.e. anywhere on the internet). I’d watch a few youtube video tutorials that showcase how it’s done, search for “connect to Azure Blob with Python” or something like that. And follow the Azure guide I linked to earlier.

The critical thing to understand is that once you’ve setup your Azure storage account, in order to access it from your Python code in an automated fashion, you need special credentials so that your python app can authenticate and securely connect to your cloud store. Once that’s done you’re on easy street and you can read/write any files you like to your cloud store, directly from your python code. Step by step instructions to do this are in the Azure guide linked above. But essentially you copy and paste your azure credentials from the Azure web portal that you login to as a human, and then you have to paste them into an environmental variable that your Python code will use at run-time to connect to Azure.

This is what I would do.

Step 1: Get your csv file into cloud storage
Before even starting on the code, get an example .csv data file into Azure blob. Once you’ve setup your storage account, you can use a desktop application called Azure Storage Explorer which allows you to manage everything. This is similar to windows explorer. I’d connect this to your cloud store, then you can manually upload your .csv to a new storage container and see that it’s there! Check out this guide to help.

Step 2: Get your web process (Dash) app to READ the .csv stored in Azure (not locally)
Now you want to modify your dash app to read the .csv file directly from Azure at run-time. This is where you would need to follow the guide to setup Python to talk to Azure (and create the special environmental variable with your Azure credentials etc). If you are determined, it’s not too hard to brute force this and learn. Instead of reading the .csv from your local disk, you are now replacing that code to connect to Azure and read the .csv file directly from the cloud storage container. The connection process is usually super fast (less than 500ms) for your python app to connect and read files on Azure blob.

Step 3: Get your worker python app to WRITE to the .csv file on Azure blob
The final step is to WRITE over or write a new .csv file to your Azure storage container. You can reuse the code you used to connect from the step above for your worker python app, noting that of course you will have to change some actions to actually overwrite the file now, rather than read it. And you’d also need to setup the environment variable in your worker python app too.

Setting environment variables in Heroku
The one thing that might stump you is setting environment variables (containing your azure credentials) so your python code can connect to Azure. In Heroku you can do this through the web portal or by commands. Guide here. You might need to do some testing to ensure you can successfully access the environmental variable in your python app (e.g. printing it to console etc). Once you’re sure it’s working, you can then use it in the code snippets to connect to Azure that you will see in the tutorials etc.

That’s it! In theory this should allow you to read/write any file to a remote cloud store from any of your running python apps! :smiley:

2 Likes

@dan_baker thanks so much for your guidance! Your steps took some time to learn and I think I understand more of the coding flow for this type of problem and Heroku’s environment variables. I followed your steps and I’m able to do the following:
Step 1 & Step 3: Get my csv file into cloud storage with my worker python app to WRITE to the .csv file on Azure blob
With the following code from youtube and learning the guide source you shared:

from azure.storage.blob import ContainerClient
# Function to upload files to azure blob

connect_str = os.getenv('AZURE_STORAGE_CONNECTION_STRING')
container = "your-preferred container"

def upload(file, connection_string, container_name):
    # Need instance of blob CONTAINER client class
    # This class allow to manipulate azure storage container and its blobs
    container_client = ContainerClient.from_connection_string(connection_string, container_name)
    print("uploading files to blob storage...")
    blob_client = container_client.get_blob_client(file)
    with open(file, "rb") as data: #overwrite existing file
        blob_client.upload_blob(data, overwrite=True)
        print(f"{file} uploaded")

Setting environment variables in Heroku
All I had to do was copy my key’s storage (‘AZURE_STORAGE_CONNECTION_STRING’) into Heroku’s ‘Configuration Variable’ using Heroku Dash Board the source you shared: Quickstart: Azure Blob Storage library v12 - Python | Microsoft Docs

Step 2: Dash app reading the .csv stored in Azure
I’m able to download the .csv file locally with the following code:

def download(dl_file_name, connection_string, container_name):
    container_client = ContainerClient.from_connection_string(connection_string,
                                                              container_name)
    blob_client = container_client.get_blob_client(dl_file_name)

    dir_root = os.path.dirname(os.path.abspath(__file__))
    download_file_path = os.path.join(dir_root, 'text1.csv')
    with open(download_file_path, 'wb') as download_data:
        download_data.write(blob_client.download_blob().readall())

However, I’m not able to get the ‘text1.csv’ file when I try to download via my web app.

Thanks a lot for fueling my hobby!

Edit:

It looks like I really can’t download the file from Azure to the /app directory:

image

I download it to my app, but when I check files via bash, there is no file under /app directory. Locally, I’m able to download the text1.csv.

I figured this out. I think the problem with my previous post is that Heroku has ephemeral file system. Therefore, I cannot create a python script to Write the file (“text.csv”) to the app’s directory “/app” and then have the @callback to access the “text.csv”. This is very tricky, and it is such a small nuance that it took me hours to debug.

So my solution to call the function directly in the @callback otherwise I cannot access the “text.csv” file from azure:

@callback(
    Output('live-graph', 'figure'),
    Input('interval-update', 'n_intervals'),
)
def live_data(n_interval):

    connect_str = os.getenv('AZURE_STORAGE_CONNECTION_STRING')
    container = "your-preferred container"
    # get_azure_data()

    dl_file_name = 'text.csv'
    dir_root = os.path.dirname(os.path.abspath(__file__))
    download_file_path = os.path.join(dir_root, 'text.csv')

    print("\nDownloading blob to \n\t" + download_file_path)
    container_client = ContainerClient.from_connection_string(connect_str,
                                                              container)
    blob_client = container_client.get_blob_client(dl_file_name)

    with open(download_file_path, 'wb') as download_data:
        download_data.write(blob_client.download_blob().readall())

    test = pd.read_csv(download_file_path)

    fig = go.Figure()

    fig.add_trace(
        go.Scatter(
            x=test['current_time'],
            y=test['data1'],
            name='data1'
        )
    )
    return fig

The following set up WILL NOT WORK even though it looks like it can work:

@callback(
    Output('live-graph', 'figure'),
    Input('interval-update', 'n_intervals'),
)
def live_data(n_interval):
     get_data()

Where get_data() is a function from a script, for example script.py:

## script.py

def get_data():
    connect_str = os.getenv('AZURE_STORAGE_CONNECTION_STRING')
    container = "your-preferred container"
    # get_azure_data()

    dl_file_name = 'text.csv'
    dir_root = os.path.dirname(os.path.abspath(__file__))
    download_file_path = os.path.join(dir_root, 'text.csv')

    print("\nDownloading blob to \n\t" + download_file_path)
    container_client = ContainerClient.from_connection_string(connect_str,
                                                              container)
    blob_client = container_client.get_blob_client(dl_file_name)

    with open(download_file_path, 'wb') as download_data:
        download_data.write(blob_client.download_blob().readall())

    test = pd.read_csv(download_file_path)

    fig = go.Figure()

    fig.add_trace(
        go.Scatter(
            x=test['current_time'],
            y=test['data1'],
            name='data1'
        )
    )

     return fig
 
1 Like

Awesome @ngaunguyens, glad you got it working :slight_smile:

Yes I suspected you might run into issues with Heroku’s ephemeral file system. Couldn’t quite remember if you can write to it at run-time (I don’t actually think you can easily) and in any case it’s a bad idea as files can disappear at any moment without warning on Heroku at run-time.

With your new cloud store solution, there is in reality no need to store any .csv files on your underlying Heroku Dyno host (even if temporarily), however you will be used to working with the .csv files with your previous local disk solution and the more familiar read-csv methods etc. I have one more :gift: for you.

You can stream Azure Blob data directly to memory instead of downloading as a file
If you want to add an extra refinement, note that you can directly stream the data from Azure into memory in your Python code, rather than download the .csv as a file (which you then need to do a read-csv action on). You can do this both ways to write or read.

Here is a working example of how to do this (and another one here), and it uses the get_blob_to_stream method from the Azure BlockBlobService library you can import. If you use this approach, you can probably neaten up your code and callbacks as you had originally attempted to do before resorting to the work-around. :blush:

2 Likes

Thanks for all the tips @dan_baker!

I will keep streaming Azure Blob data for another project. Currently, it works - if I mess with the code, it will more than likely to break :joy:.

Thanks again, hopefully some newbies will find this post useful for later down the road :slight_smile: .

Cheers!

1 Like