DiskcacheManager not working in Dash App in Production. Caching issue?

Good day! This is a continuation of a previous post, and despite following the assistance here, and spending countless hours, I’ve been unable to progress.

I believe it is a caching issue that my server is not allowing, but I am uncertain.

Please note - that I am a mere hobbyist, with limited knowledge on servers and the alike. I am simply trying to satiate my curiosities in the wonderful world of Python and Plotly.

Summary

I have

  • A Flask App, served through NGINX and UWSGI
    – Found here: gusmontano.com
    – It is in a Digital Ocean Ubuntu Droplet

  • I made a Dash App, that, loosely speaking, is connected to my Flask app.
    – It is a simple ‘curiosity project’, that converts an YouTube video to an MP3: http://gusmontano.com/p/youtube-to-mp3.
    – The Dash App streams a YouTube video, caches the result into memory, converts to an MP3, before releasing it to the user.

  • There are no issues when running this locally in Ubuntu (remotely), through the usual python run.py, or UWSGI

  • As soon as I kick off NGINX to serve the app - it no longer works.

Investigation: Running the App in 3 Cases

Using python run.py

The app works successfully, with the logs showing

POST /youtube_to_mp3/_dash-update-component?cacheKey=fe28ff0541f35547ba138fd511b0f82446a07aeb&job=2336913 HTTP/1.1" 200 -

Using uwsgi

uwsgi --socket 0.0.0.0:5001 --protocol=http -w run:app

The app runs successfully, with the logs showing

[pid: 2336956|app: 0|req: 11/11] 99.98.109.223 () {40 vars in 888 bytes} [Sun Jan 29 22:51:14 2023] POST /youtube_to_mp3/_dash-update-component?cacheKey=a36db1c766c643ac5a94d1733982551e3685c69d&job=2336964 => generated 4922432 bytes in 1197 msecs (HTTP/1.1 200) 2 headers in 76 bytes (12 switches on core 0)

Introducing NGINX

With NGINX - the app no longer works. The app is kicked-off, but I believe the button callback is never initiated due to problems with caching.

The Flask App

The Flask App, including the linking to the Dash App is shown below.

from flask import Flask

from app.extensions import db, login_manager
from app.blueprints.admin.admin import adm
from app.blueprints.users.view import user
from app.blueprints.auth.auth import auth

import app.models.other_apps.youtube_to_mp3.app as youtube_to_mp3_app

DIGITAL_OCEAN_POSTGRES_CONNECTION_STRING = 'POSTGRESQL_DETAILS'

def create_app():
    app = Flask(__name__)
    app.config["SQLALCHEMY_DATABASE_URI"] = DIGITAL_OCEAN_POSTGRES_CONNECTION_STRING
    app.config["SQLALCHEMY_TRACK_MODIFICATIONS"] = False
    app.config["SECRET_KEY"] = "MY_SECRET_KEY"
    db.init_app(app)
    login_manager.init_app(app)
    app.register_blueprint(user)
    app.register_blueprint(adm, url_prefix="/admin")
    app.register_blueprint(auth, url_prefix="/auth")

    # Here comes the Dash app!
    youtube_to_mp3_app.initialise(app)

    return app

The Dash App

The Dash App is shown below. Please ignore using Redis and Celery. While I understand the benefits, it requires a greater investment of time to setup Celery. I will read into this. Instead, I am attempting to solve the Disk Cache issue in the else statement.

In production, I have observed that there are issues completing the steps in the else statement. I believe something is wrong in the caching, and I am hoping for assistance here.

from dash import Dash, dcc, Input, Output, State, DiskcacheManager, CeleryManager

from pydub import AudioSegment
from pytube import YouTube, exceptions
from uuid import uuid4
from os import environ
from io import BytesIO

from .layout import app_layout

APP_NAME = 'youtube_to_mp3'
APP_STYLESHEET = "https://cdn.jsdelivr.net/npm/bootswatch@5.2.3/dist/lux/bootstrap.min.css"

launch_uid = uuid4()

def initialise(server):

    if 'REDIS_URL' in environ and False:
        # Use Redis & Celery if REDIS_URL set as an env variable
        from celery import Celery
        celery_app = Celery(__name__, broker=environ['REDIS_URL'], backend=environ['REDIS_URL'])
        background_callback_manager = CeleryManager(celery_app)

    else:
        # Diskcache for non-production apps when developing locally
        import diskcache
        cache = diskcache.Cache("./cache")
        background_callback_manager = DiskcacheManager(cache, cache_by=[lambda: launch_uid], expire=60)

    dash_app = Dash(__name__,
                    server=server,
                    url_base_pathname=f'/{APP_NAME}/',
                    external_stylesheets=[APP_STYLESHEET],
                    background_callback_manager=background_callback_manager
                    )

    dash_app.layout = app_layout

    @dash_app.callback(
        output=Output('downloaded_file', 'data'),
        inputs=Input('button_submit', 'n_clicks'),
        background=True,
        state=State('input_youtube_link', 'value'),
        running=[(Output('button_submit', 'disabled'), 'True', 'False')],
        progress=[Output('status', 'children')],
        prevent_initial_call=True,
        manager=background_callback_manager)
    def update_output(set_progress, n_clicks, youtube_link):
        if n_clicks > 0:
            try:
                # Get YouTube Object
                set_progress('Downloading audio...')
                yt = YouTube(youtube_link)
                # Stream audio only, and download to memory
                audio = yt.streams.filter(only_audio=True).first()
                audio_bytes = BytesIO()
                audio.stream_to_buffer(audio_bytes)
                audio_bytes.seek(0)
                # Convert audio to MP3 format
                set_progress('Converting audio to MP3 format...')
                sound = AudioSegment.from_file(file=audio_bytes).export(BytesIO(), 'mp3')
                set_progress('Done!')
                return dcc.send_bytes(sound.getvalue(), f'{yt.streams[0].title}.mp3')
            except exceptions.RegexMatchError:
                set_progress('Inappropriate URL. Check and try again.')

    return dash_app

NGINX Conf

My NGINX Configuration is shown below

server {
        listen 80;
        server_name gusmontano.com www.gusmontano.com;

        location / {
                proxy_pass http://localhost:5000;
                include uwsgi_params;
                uwsgi_pass unix:/gusmontanocom/gusmontanocom.sock;
        }

}

nginx.conf

user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;

events {
        worker_connections 768;
        # multi_accept on;
}

http {

        ##
        # Basic Settings
        ##

        sendfile on;
        tcp_nopush on;
        tcp_nodelay on;
        keepalive_timeout 65;
        types_hash_max_size 2048;
        # server_tokens off;

        include /etc/nginx/mime.types;
        default_type application/octet-stream;

        ##
        # SSL Settings
        ##

        ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3; # Dropping SSLv3, ref: POODLE
        ssl_prefer_server_ciphers on;

        ##
        # Logging Settings
        ##

        access_log /var/log/nginx/access.log;
        error_log /var/log/nginx/error.log;

        ##
        # Gzip Settings
        ##

        gzip on;

        ##
        # Virtual Host Configs
        ##

        include /etc/nginx/conf.d/*.conf;
        include /etc/nginx/sites-enabled/*;

        proxy_read_timeout 3600;
        proxy_connect_timeout 3600;
        proxy_send_timeout 3600;
        uwsgi_read_timeout 3600;

}

I truly appreciate any help here, and am using this as a great learning exercise into servers and clients!

Kind regards
Gus

Hello @GusMontano,

Just to make sure, your ports are lined up? Your dash app is running on 5000 and not 5001?

In your steps it looks like the port isn’t lining up.

@jinnyzor - as always, thank you for your help.

In the UWSGI case, to demonstrate what is going on, I used port 5001, as 5000 is being used by my production server.

In this case I have

flask_app -> dash_app(server=flask_app) -> flask_app.run(debug=False)

Running this, I’m assuming port 5000 is defaulted, then NGINX listens to all requests and sends it to port 5000 (In my poor layman terms). I assumed all would be linked.

Where would the port on the dash app go in this instance?

Furthermore, could an inconsistent port number only cause a problem on the caching side of this app in production? I have another Dash App here:

The 2023 Houston Marathon,

which seems to work find in this setup (although no caching is required)

Thanks again.

@GusMontano,

What happens if you navigate directly to the :5000 port?

This would go around nginx and verify for sure that you are running the server as expected.

When I run my server, I use nginx, supervisor with gunicorn, where it will autorestart on error.

You can also check your error logs to see if there are any hints as to what is going on. There could be a potential for the file being too large for Nginx to handle as well.

Hi @jinnyzor - do you mean gusmontano.com:5000? If so, then this doesn’t work! Nor does IP:5000.

Interesting - my lacking knowledge always assumed that every request for gusmontano.com or IP would lead to the 5000 port. Although, explicitly entering the port doesn’t take me anywhere.

For example - if you attempt gusmontano.com:5000, I’m assuming this doesn’t work either.

Does this mean anything, and in specific to the detriment of my app?

EDIT: Oh wow! NGINX - it’s actually port 80, as given by listen 80. Therefore, I’m assuming all requests for the website are sent to port 80, and this is then passed to port 5000. Though, should they be the same?

Just checked, and NGINX will complain. The Flask App responds to port 5000, and the Dash App should do the same…

Oh I’m so lost…

server {
        listen 80;
        server_name gusmontano.com www.gusmontano.com;

        location / {
                proxy_pass http://localhost:5000;
                include uwsgi_params;
                uwsgi_pass unix:/gusmontanocom/gusmontanocom.sock;
        }

}

Correct, 80 is regular http and 443 is https.

These according to Nginx will redirect to your :5000. But since :5000 isn’t working, something isn’t working quite right. You need to look at the error logs for however you spinning up the server.

I like to use supervisor and gunicorn, with supervisor you can set up error logs and std out logs too.