So you have your Dash app running on your local machine and you’re finally ready to share it with the world on a public site.
The problem is: words like like Git, Flask, Gunicorn and Heroku sound like strange mythical creatures, even after a few drinks. Worry not, having just gone through the process of deploying Dash to Heroku myself for the first time, I’ll share what I’ve learned along the way. I’ll outline some surprising pitfalls and solutions I found in the hope this will save you time and effort. This is the guide I wish I had when I started.
Background
In particular, I think the process to successfully deploy a Dash app on Heroku, as an example use case, is not trivial for many first timers. For example, simply serving static files (like audio, images, video) does not work out of the box with Heroku like it does from your local machine. I didn’t realise this, along with a few other quirks.
This essay is designed to supplement the existing documentation and attempts to fill in some gaps, explain the quirks, and provide a very brief fly-over of each component and how everything fits together. I’ll share notes from my personal experience. I’ll also attempt to explain the technology stack based on my imperfect understanding of how it all works.
No doubt I’ve got many things wrong so I welcome corrections.
Assumptions
- You have a running Dash app locally hosted (with a requirements.txt)
- You are relatively new to Python
- You have a strong cup of coffee
- You have never deployed a public web app before
- Words like Heroku and Gunicorn scare you.
The Problem
In my research I scanned many blog articles and guides about deploying Dash to Heroku but found most to be a little lacklustre or not specific to Dash. They are typically bare minimum, light on explanation, and don’t outline key issues you will encounter in the detail you need. If you’re working with Dash, chances are you are newbie to Flask aswell so many concepts are not well understood. I’ve yet to see a guide that outlines the core concepts and pitfalls for specifically deploying a Dash app to Heroku. I believe this lack of comprehensive guidance is a (solvable) barrier to entry for many Dash users like myself.
How much pain is this going to be?
How hard really is it to Deploy to Heroku as a first timer? How much pain is involved? What problems will I encounter? In short, I’d say optimistically it can be done in a few hours, but realistically about 10hrs for a first timer. Pain meter: medium spicy.
This is because it takes time to setup the environments like Github, Heroku command line interface, add special new files to your project directory (repository), modify the code in a few spots, and get used to the commands you need in order to see what’s going on and make it all work. It can be a little daunting at first, but once you’ve done the initial setup, you’re on easy street. Deploying code running on your laptop to your live public web app is a few mouse clicks and single command in a terminal, which is super cool. (Caveat: this does not cover security and authentication, this is purely to deploy a hobby app).
Why Heroku?
Like myself, you have probably heard of Heroku as a well loved platform for deploying hobbyist web applications for free. Of course it’s not the only one, there are many, and it is scalable to enterprise level deployments. But it is universally known and liked by the community, so I chose to go this route and see what all the fuss was about. Verdict so far: loving it. Key reasons the community loves Heroku (as far as I know):
- It’s free for hobby apps, which is great to get started and for demos
- It’s got clear, concise documentation
- It natively supports Python web apps (read: minimal config needed)
What is Heroku?
It’s a platform-as-a-service (PaaS) for deploying and hosting web applications. In the context of your Dash app, this means Heroku provides the physical hardware (storage, compute), software (linux/unix/sql) services, and dependencies (packages), in a containerised environment to deploy and host your application on a publicly accessible URL (end-point). It does this through provision of virtualised linux containers called “Dynos” which essentially act as your personal linux webserver, with ram, cpu, and linux ‘installed’. (It’s not quite like this in reality but a good analogy).
Dynos come in a variety of types and can be scaled vertically (more ram, compute, storage per instance) or horizontally (duplicate dynos in parallel) as your specific project requirements demand. This can be done almost instantaneously at command line. The free version gets you one dyno with up to 500MB storage and 500MB ram. It sleeps after 30 minutes of inactivity, presumably so Heroku resources are not drained. So the catch with the free version is that your website can take a good 30-60 seconds to load initially, as your free Dyno is provisioned on demand. If you go to a paid plan, starting at about $7USD/month, your dyno(s) stay on and ready 24hrs/day.
Why GitHub?
In short - Heroku natively supports deploying repositories that reside in GitHub. This is good news. Basically it means if your project is already in a GitHub repository (free for private/public repos) then you can easily deploy it on Heroku AFTER you have added a few additional files that are outlined in the deployment guides, and in a section further below.
If you’ve been developing your pet project on a local machine, this is an important step to take it to a public (or private) cloud repository. It’s a good move anyway because you have full versioning history, you are protected from hdd failure, and you can share publicly or privately etc. It does however come with intellectual debt, with some interesting concepts and terminology to get your head around, like clone, fork, merge, push, pull, commit.
Yet another barrier to newcomers is the issue with security and how this affects you accessing and changing the code in your cloud repository on GitHub or similar. Basically GitHub wants a secure connection between your computer and it’s servers before it will happily accept code changes. There are two main ways it achieves this: using credential authentication over HTTPS (requiring a username/pass every time a connection is made), or via SSH public/private key encryption which is not natively supported by Windows. This extra complication, combined with the other scary words like clone, fork, merge can be a little overwhelming at first. Fear not, there is a desktop app that can setup a secure connection between itself and your GitHub repo, facilitating seamless easy updates to your code repository.
If you are developing on a windows/mac machine (which I assume the majority of first timers are), I’d highly recommend getting the github Desktop application. This just makes the process of cloning, fetching and pushing changes back to your repository on github MUCH easier without the need for any command line. It’s not that I’m against command line, it’s just that this particular process can be clunky on windows, requiring either user credential authentication or SSH keys. (If you are more hardcore you can of course install Windows Subsystem for Linux (a way to have a fully functioning Linux system on Windows without partitioning your HDD). If you do this you can setup SSH keys and enjoy the benefits of linux for managing your code repo but for first timers, it’s really not necessary). In short, you can avoid a lot of hassle just by using the Desktop app, setting up the security within the app, then it’s single click of the mouse to commit changes and push updated code to your repo on GitHub.
What is GIT?
No one really knows.
What is Gunicorn?
When you figure it out, let me know. Gunicorn, to the best of my understanding, is a production-ready HTTP server specifically for Python web applications which runs natively in Unix. If you’ve been developing your dash app purely on your local machine @ ‘localhost:8080’ or ‘http://127.0.0.1:8050/’ you will be running a light weight HTTP server that is shipped with your Python installation. This is not Gunicorn. It’s likely you have not yet glimpsed this rare and mythical creature of the forest.
The local HTTP server (shipped with your Python installation) is automatically run by your Python Kernel when your dash app is executed on your local machine. The issue is, it’s not designed for handling incoming traffic from a production website and so when you deploy to the web, you need a production-ready HTTP server. A popular one is Gunicorn. Notably, Heroku provides native support for Gunicorn which makes things easy. It’s all outlined in the guides, but just to clarify, all you need to do is add a single line of code to your dash app (‘server = app.server’), add Gunicorn into your requirements.txt so it is installed as a package on your local machine (and by Heroku at deployment), and reference it in a special file you will create called the Procfile. More on this later but I think it’s worth briefly touching on the HTTP server as it’s all a bit mysterious the first time.
Web is hard
This is a simple truth. Web is multi-layer, multi-language, multi-protocol, multi-platform, multi-user. It’s a mind boggling chain of infrastructure bolted to other infrastructure to make a modern web application run. For many non-IT people (and IT people for that matter), even the concept of a locally hosted webserver takes a bit of abstract thought, let alone understanding the true technology stack that lies underneath a real application. It’s also worth reflecting on just how new some of this technology is, so I’ve indicated the year these tools were created in the table below.
The simplified technology stack
This is imperfect, so please help me to correct it. But it’s useful, I think, to see some of the layers required to get your code actually deployed onto the web. We start with your actual code at the very top of the stack, and drill down layers all the way to Heroku.
Layer | Created | Name | Note |
---|---|---|---|
User Code | today | Your code | Python code for your application |
Web application | 2017 | Dash | Allows entire website to be written in Python by wrapping web components and facilitating 2-way communication (callbacks) |
Javascript library | 2015 | Plotly.js | Allows access to powerful ecosystem of data visalisations (40 chart types) that run responsively client side. Dash is built on top of this library. The library itself is built on top of d3.js and stack.gl. |
Javascript library | 2013 | React.js | An open-source, front end, JavaScript library for building user interfaces or UI components. Notably Dash is witten on top of this. |
Web framework | 2010 | Flask | Collection of modules and libraries to abstract away hard things in web like protocols and threads. Flask is the underlying web application that sits under Dash. Dash is, in essence, a Flask application. |
Web template engine | nfi | Jinga | Template engine for Python. Something important for serious developers. Novices can ignore. |
WSGI toolkit | nfi | Werkzeug | List of web application libraries (used by Flask natively). Novices can ignore. |
WSGI (HTTP) Server | 2010 | Gunicorn | A popular python HTTP server; the thing that manages incoming requests from the browser. |
Code Repository | 2008 | Github | A free facility to store, collaborate, manage, update and deploy your code securely in the cloud. |
Web Deployment & Hosting | 2007 | Heroku | A scalable platform-as-a-service (PaaS) to deploy and physically host your web application and make it accessible on the internet. |
The point I’m trying to make here is that this grossly oversimplified web technology stack is still far from simple; to say nothing about front end layers such as javascript, CSS etc. Web is hard because of the sheer number of abstraction layers. Dash, to me, is a beautiful abstraction that builds on everything below it to simplify what is actually an insanely complex machine: the modern data-rich web application.
Dash-Heroku deployment, in a nutshell
What actually needs to be done:
- Dash app running on localhost
- Install Git
- Setup github account (+ recommend install Github Desktop)
- Setup Heroku account (+ install the command line interface)
- Add dependencies and special files (i.e. install and import Gunicorn, create Procfile and runtime.txt)
- Clone repo from github to local machine (only once)
- Create heroku app linked to your repo (only once, ref deployment guides, heroku CLI)
- Commit and push your code changes to github repo (repetitively)
- Deploy/Re-deploy Heroku app by pushing changes from Heroku CLI (“git push heroku main”)
Deployment Guides
The guides below are concise and useful, and I would of course start with these. If I’m honest I think they are a little light on detail for newcomers and would benefit greatly by having a supplementary explanatory guide akin to something like this essay.
- Plotly’s Dash deployment guide https://dash.plotly.com/deployment
- Heroku’s guide Getting Started on Heroku with Python | Heroku Dev Center
- Recommendations: Install Heroku command line interface (CLI) and GitHub Desktop if a windows/mac user
- Also, this youtube tutorial from a fellow Plotly Community Forum member.
The magical ingredients to add to your project
A quick note on the special files you need uniquely to get your python project deployed to Heroku. This is outlined in the deployment guide, so I’ve just provided a few notes from my experience:
Ingredient 1: Procfile
This strange extensionless file must reside in your project root, and tells Heroku how to handle web processes (in our case using Gunicorn HTTP server) and the name of your Python application.
Typically the Procfile would contain a single line:
web: gunicorn app:server
Where:
- ‘web:’ tells Heroku the dyno main process is a web process
- ‘gunicorn’ tells heroku that the HTTP server to use is Gunicorn (for which it has native support for)
- ‘app’ references the filename of the main python file without the .py extension. So if you follow the convention of ‘app.py’ you would use ‘app’ here. But note if your main python file is ‘anything.py’, you would have ‘anything’ in place of ‘app’.
- ‘server’ references the underlying flask app. Commonly you would define a variable ‘server = app.server’ and this references that variable, I believe. To be more confusing, the ‘app’ in this variable declaration actually refers to the dash instantiation variable in the snippet below:
app = dash.Dash(__name__)
server = app.server
Yes I know what you’re thinking, this is finicky and it’s really easy to misunderstand with all these ‘app’ references everywhere. Take home is: as long as you are using an app.py main file, as is the convention, and you declare a ‘server = app.server’ line of code after your Dash declaration, you can use the example Procfile and it should work. If you get anything with the Procfile wrong: pain and suffering will ensue.
To make the Procfile, from memory in Windows, you can just create a text file, enter the single line. Then strip out the extension. (This worked for me and I do not need to have a secondary Procfile.win which is sometimes talked about in the documentation)
Ingredient 2: runtime.txt
This file (which must also be in your root project folder) simply tells Heroku which Python runtime to use. Currently it can contain a single line, e.g.:
python-3.7.8
Just create this as a notepad .txt file in windows. Done.
That’s really it. It’s mainly these two files (Procfile, runtime.txt) that Heroku needs in your repo project directory in order to work. As long as you have followed the basics, and added Gunicorn to your requirements.txt etc, in theory you are good to go.
Ingredient 3: perseverance
Not to be underestimated, dogged perseverance and determination is a key ingredient to the potion.
It’s magic time
You’ve got your code in a GitHub repository, with the required tweaks and files created. You have Heroku CLI installed and have created a Heroku app linked to your GitHub repo. It’s 4am and the sun is coming up soon. It’s show time.
Deploy from the Heroku command line interface:
git push heroku main
These four words are the spell that makes the magic happen.
Type them into the Heroku CLI in the right conditions, sit back smuggly, and enjoy the show.
For those new to Heroku, if everything has worked after your “git push heroku main” from the Heroku CLI, your app will be deployed to a Heroku subdomain like:
If this is the case, recommend a little dance.
Copy-paste the URL displayed in the Heroku CLI into your browser and get ready…to be disappointed. Chances are the first time you will see “APPLICATION ERROR” or something like that. Don’t panic.
The first thing you should do is bring up the log (which is effectively your python console) and see what’s going on, from the Heroku CLI. Any print statements, or logger outputs from your code will display here just as they do in the console on your local machine.
View logs:
heroku logs --tail
Check for things like “Module not found errors” and simple things like that. The most common problems I’ve found are forgetting to add packages to my requirements.txt file because I frantically installed them to my local machine with conda/pip to get something working. If you’ve found some obvious problems, fix them, repush your code to GitHub, and then redeploy from Heroku CLI with “git push heroku main”.
Notably though, the first time is the hardest. Errors in your Procfile can still cause Heroku to deploy successfully, but the dynos will crash or fail to start, so definitely check the Procfile. By now the sun is likely coming up and it’s a work day. But it’ll all be worth it when you see your app hosted publicly, so carry on.
Special note if there are no changes to your repo, Heroku will not deploy. Which makes sense. So if have a repo cloned onto your local machine, and you are making changes, be sure to commit and push changes to your GitHub repo first (either with command or with GitHub desktop), then in the Heroku CLI terminal, just type in the deploy command.
HEROKU TIP: Useful commands from Heroku CLI
Below is my list of critically important tips, pitfalls, and pitfall solutions when using Heroku.
Explicitly referencing your app name:
Note Heroku can sometimes be funny about requiring you to explicitly specify your app in the command. If you just have a single heroku app, often you can avoid it. But often you may need to append “-a <yourapp>” to the command.
Display current apps:
heroku apps
Display current dynos:
heroku ps
heroku ps -a <yourapp>
heroku ps:scale web=2:standard-2x
In this case we are provisioning two standard-2x dynos to run concurrently. Special note, if WEB_CONCURRENCY=4, this means each Dyno can serve 4 simultaneous HTTP incoming requests, meaning your whole application can serve 8 concurrent requests; the benefit of horizontal scaling.
Run bash terminal:
heroku run bash -a <yourapp>
Restart dynos:
heroku dyno:restart
Add additional log metrics:
heroku labs:enable log-runtime-metrics
View logs:
heroku logs --tail
HEROKU TIP: Add log-runtime-metrics to log
From the Heroku CLI (once logged in) when you have deployed your app, you can view a live log tail by typing
heroku logs --tail
Repeated just in case you missed it. This essentially gives you your console output. One thing I’d suggest is adding in a new feature that outputs resources statistics of your dyno(s) timestamped every 20 seconds, like memory levels, cpu load etc, which is very useful. Type this in the Heroku CLI, to permanently add it:
heroku labs:enable log-runtime-metrics
HEROKU PITFALL: Serving static files does not work
I repeat: serving static files DOES NOT WORK. Something of paramount importance that is not obvious, is that out-of-the-box, Heroku (I think more correctly: Gunicorn itself) does not natively support serving static files. This means, whilst your python application itself can access files in any subfolder in your project folder (such as .csv files and the like) it’s a very different story to actually serve them via http in the client browser.
This means any images, video, audio, anything you are currently serving from your ‘localhost’ webserver will fail on deployment with Heroku. I believe this is a quirk of the PaaS model in that files themselves are not stored in the traditional way you would imagine them to be on a file system so there are issues with low level headers that are attached to files, and/or Gunicorn itself does not natively support serving static files. In any regard, there is magic under the hood.
As an aside, If you don’t already know from the docs, it’s important to understand that the Heroku file system is not persistent. Like many of my past relationships, Heroku’s file system is ephemeral or transient. It lasts about as long as a one night stand. With the exception of the files you deploy with your repo (e.g. csv, json files etc) any new files created at runtime will disappear after a few days.
Anyway, to store and serve persistent static files, as I said any files uploaded to Heroku as part of your project file suite will be fine and persistent, and accessible by your dash app internally. BUT, the moment you want to serve static files externally to your Heroku-Dash deployment, you will rapidly run into problems. There are two main solutions, one is simple and fast.
Solutions:
- Host your files on a 3rd party like S3, Cloudfront and link the URL in your dash app (Worth doing if you will be hosting a serious footprint of files)
- Use the Whitenoise library. Quick and easy. A few lines of code and you’re serving files in the way you would imagine.
Personally I found whitenoise to be a life saver. Literally “pip install whitenoise” (and make sure it’s in your requirements.txt) and you’re almost there. Two lines of code needed in your dash app:
from whitenoise import WhiteNoise
server = app.server
server.wsgi_app = WhiteNoise(server.wsgi_app, root=‘static/’)
You should already have the server=app.server anyway as this is needed by Gunicorn and for the Procfile. What this essentially does is set a folder (which you must create) called “/static” in your root. Everything contained within this (including subfolders) can be statically served by Heroku. Images, videos, pdfs, whatever the hell you want. Just note Heroku is extension case-sensitive. So blah.png is different to blah.PNG.
Also, don’t try to get smart and change the ‘static’ folder name in the whitenoise code to some arbitrary name or ‘assets’ or anything like that: it has to be ‘static’ due to an underlying Flask constraint. Period.
This seems like a pretty major issue that I don’t think much documentation exists on. I spent a long time on Stackoverflow looking it up. I really think it should be a sticky thread on the Plotly Forums or something.
Also, the Whitenoise documentation is not specific to Dash, it is more focused on general Python apps which are typically Flask apps. This means that it’s still not obvious what you need to do, and the code snippets will not work without modification. For example whitenoise states for Flask apps, you must add the following code to your app:
app.wsgi_app = WhiteNoise(app.wsgi_app, root=‘static/’)
This won’t work for your dash app. In this case ‘app’ is the flask app. So in a Dash app (which sits ontop of Flask) you actually need to replace the ‘app’ with ‘app.server’ in the snippet above to reference the underlying flask app and for whitenoise to work. Or simply define a variable such as ‘server = app.server’ and use the code snippet I outlined at the beginning of this section.
Again, lots of these things are a 2 second fix if you know how. But can cost you literally HOURS AND HOURS……AND HOURS of time if you don’t know. Trivial for Flask developers. Not trivial at all for newcomers.
HEROKU PITFALL: Favicon may not work
For some reason I had lots of trouble with this. Anyway I managed to get it going by simply having a :
/assets/favicon.ico
From my root project directory. Special note that no other static files are served from here, it’s a stand-alone folder. In fact, don’t be lulled into thinking you can serve static files from your /assets folder on Heroku: you can’t. (see whitenoise section). Others have had problems with Heroku changing the extension name of the favicon causing it to fail. One failsafe option to note is you can in fact log into a Heroku Bash shell after you have deployed, and navigate to all your project folders/files to see what Heroku sees. See this post.
From heroku CLI:
heroku run bash -a <yourappname>
This will provision a new Dyno container running a Bash shell. Basically it’s a terminal to your deployed app.
HEROKU PITFALL: Web concurrency is important and can be configured
There is lots of ‘worker’ and ‘web’ terminology that gets confusing. Out of the box when using Gunicorn as your Python HTTP server, Heroku essentially guesses how many concurrent web-worker-processes to run for each dyno instance running your web app. Typically this is 1-6 concurrent ‘gunicorn-worker-web-processes’ per dyno for the commonly used hobby to standard 2-x dynos. This is how many client requests (i.e. from a web browser) can be simultaneously served by your app at an instantaneous point in time.
A gunicorn web-worker-process is a process capable of serving a single HTTP request at a time. So if you only had one, this means your website becomes quite unresponsive with a few users making simultaneous requests, and having to wait for these requests to be actioned from a queue. Essentially this is what Gunicorn does, it forks the main web process running on it’s Dyno into multiple (threads?) processes. Web concurrency in Heroku allows each dyno instance to essentially carve up it’s resources to serve multiple concurrent HTTP requests, which it calls WEB_CONCURRENCY. The problem is, this can sometimes lead to underestimating resources needed, and running over Dyno memory limits, causing failure, restarts, massive slow downs due to disk swap having to be used etc. Basically you don’t want to have too much web concurrency because it might break your dyno.
As I said, you don’t need to worry about this day 1, your app will work. But as you start load testing it, you may find you run into memory overrun issues and all sorts of things like that. If you have a high horsepower python application that chews resources, suggest you manually set your WEB_CONCURRENCY variable in heroku command line.
For example:
heroku config:set WEB_CONCURRENCY=3
heroku config:set WEB_CONCURRENCY=3 -a <herokuappname>
If performance is not compromised, you can increase web concurrency to increase the number of clients you can serve in parallel, while minimising Dyno cost. If you need to serve more, you can scale Dyno’s horizontally knowing that each one can serve an explicit number of concurrent HTTP requests
And of course you can monitor this with “heroku logs --tail” or in the Heroku dashboard METRICS section
HEROKU PITFALL: Hard limit 30 second request timeout
It’s important to be aware that Heroku has an unchangeable 30 second timeout for serving HTTP requests. This is a common problem especially encountered by Dash users because many of the data science applications have long load times, see this post. These might work fine running on your localhost, but be aware your Heroku deployed app must be able to serve within 30 seconds or it will time out. Heroku docs state a few work arounds but take special note of this problem.
HEROKU PITFALL: Develop on the master/main branch of your GitHub repository
If you are new to GitHub, just know that you can have multiple ‘branches’ of your project as you might take it in different directions. These can be merged or left as separate branches. The central branch by default is called master or main in GitHub. When you create your Heroku app it interacts with your GitHub repository to create a kind of Heroku mirror image behind the scenes. If you are developing your current code on a branch that is not master or main, prepare for pain. It’s not that it can’t be done, I just had a lot of trouble with this when trying to deploy to Heroku and found the best rule of thumb is to just develop all my code on the default ‘main/master’ branch in my GitHub repository.
Custom Domain
It’s not too difficult to setup a custom domain for your Heroku app. Obviously you need to purchase a domain first. Once you’ve done that, the provider will typically have a portal where you can login and adjust settings.
Heroku will generate a unique DNS target in the SETTINGS area of the dashboard, once logged in. Such as
What you need to do is copy this DNS target from the Heroku portal (settings page) and then login to your domain provider portal (e.g. Namecheap) and for your domain, create a new “CNAME record” with host “www” value “Animate-salamander-8duwlndghfqbtj0t90uep8bmu.herokudns.com” (your unique Heroku DNS target).
If it worked ok, in a few hrs your new domain should work!
Essentially all this is doing is when someone types your actual domain name www.blah.com it is redirecting to the Heroku DNS target, which points the incoming HTTP request to Heroku infrastructure, which then serves the actual page (as if you’d typed in blah.herokuapp.com). This was not entirely obvious to me as a newcomer. Again I fumbled my way though this as a first timer and it was pretty painful despite good documentation.
Flask Caching on Heroku
If you have Flask Caching running on your local machine, it’s straight forward to setup on Heroku with a free Memcachier account. And the docs are good. You can cache to the ephemeral Heroku file-system without Memcachier ,noting you might max out your 500MB of Dyno storage, otherwise you can get 100MB free high performance cache via Memcachier.
Getting Fancy with security and autoscaling etc
When you want to go to the next level and setup auto-scaling of machines, proper security/authentication etc, I think this is when it starts becoming worth considering Dash Enterprise OR going down the path of provisioning your own virtual machines, setting up containerised pipelines using Docker, Kubernetes and manage autoscaling with Rancher, for example etc. It’s DevOps territory.
For the newbies and hobbyists like me out there, I sincerely hope this has helped you get your project up and running faster with less pain
Cheers
Dan