Apscheduler is executing job multiple times even with --preload (gunicorn)

I am using Apscheduler’s BackgroundScheduler that executes an API call and does a few other things. The job runs but it gets executed three times as I am using three workers (makes sense). Based on what I have read, I added --preload but now the scheduled job runs four times.

gunicorn

gunicorn --bind=0.0.0.0 --timeout=120 --workers=3 --worker-connections=1000 --worker-class=gevent --preload app:server

app.py

app = DashProxy(__name__, external_stylesheets=external_stylesheets,
                title='Dashboard', use_pages=True,
                transforms=[ServersideOutputTransform()])
server = app.server
auth = GoogleAuth(app)  # Google OAuth

# dash layout , callbacks, etc. here

def scheduled_job():
    try:
        dd.update_data()
        # remove backend files
        for file in glob.glob(pathname='file_system_backend/*', include_hidden=True):
            os.remove(file)
        SMTP.send_email(subject='Dashboard Updated',
                        body='The dashboard has been successfully updated.',
                        recipients='xxxxxx')
    # Send an email on error
    except Exception as err:
        SMTP.send_email(subject=f'Dashboard {type(err)}',
                        body=str(err),
                        recipients='xxxxxx')



scheduler = BackgroundScheduler(timezone='UTC')
scheduler.add_job(func=scheduled_job, trigger='cron', hour=8, minute=0)
scheduler.start()

if __name__ == '__main__':
    app.run(debug=False)

Any advice on how to have the scheduled job execute only once? I do not need to use Apscheduler if there are other suggestions.

You could use a lock (potentially combined with writing a status flag to a shared memory space) to ensure that only one or the workers run the job,

1 Like

Turns out part of the issue was related to monkey patching - gevent.monkey.patch_all(). Doing patch_all() before scheduler.start() cause the scheduler to start for every worker. I ended up just using gevent.monkey.patch_ssl() because of urlib3 and requests. I also moved my scheduled job to my gunicorn config file using the when_ready function.

config.py

from gevent import monkey
monkey.patch_ssl()  # monkey patch ssl before requests gets imported
import DashboardData as dd
from apscheduler.schedulers.background import BackgroundScheduler
import SMTP
import glob
import os
import multiprocessing

bind = '0.0.0.0'
workers = multiprocessing.cpu_count() * 2 + 1


def scheduled_job():
    """Schedule the exports"""
    try:
        dd.update_data()

        # remove backend files that are saved on the server every day
        for file in glob.glob(pathname='file_system_backend/*', include_hidden=True):
            os.remove(file)

        SMTP.send_email(subject=f'Dashboard Updated',
                        body='The dashboard has been successfully updated.',
                        recipients='xxxxx')

    except Exception as err:
        SMTP.send_email(subject=f'Dashboard {type(err)}',
                        body=str(err),
                        recipients='xxxxxx')


def when_ready(server):
    """Called just after the server is started."""
    scheduler = BackgroundScheduler(timezone='UTC')
    scheduler.add_job(func=scheduled_job, trigger='cron', hour=8, minute=0)
    scheduler.start()

gunicorn

gunicorn --config=congif.py --worker-class=gevent app:server