I am using Dash to create a search engine tool for company reports. I am using Whoosh for indexing and searching - I am very confident Whoosh is not the issue, but I am not 100% sure.
Either way, up to now, each morning I run a batch script that gathers new reports from the previous day, writes/appends them to the search index file, and RE DEPLOYS the app with the updated data to the server. Dash supports automatic data updating, so I followed their examples to create a tasks.py file that retrieves reports from our database and writes them to an index file. NOTE I am not writing anything to the on-server Postgres DB - I am writing to the index file referenced in the project folder. I kept their connection code because I did not know what I could get rid of and what needed to stay. I have shortened my task, but I connect to a DB, pull data, write to the filename referenced, and print a 2nd check:
# Copy of their tasks.py
if os.environ.get("DASH_ENTERPRISE_ENV") == "WORKSPACE":
parsed_url = urlparse(os.environ.get("REDIS_URL"))
if parsed_url.path == "" or parsed_url.path == "/":
i = 0
else:
try:
i = int(parsed_url.path[1:])
except:
raise Exception("Redis database should be a number")
parsed_url = parsed_url._replace(path="/{}".format((i + 1) % 16))
updated_url = parsed_url.geturl()
REDIS_URL = "redis://%s" % (updated_url.split("://")[1])
else:
REDIS_URL = os.environ.get("REDIS_URL", "redis://dataupdateredis:99429b1023166af3d3de765f24a9d06398b95f9bd88b83aebd969a65fa216fc0@dokku-redis-dataupdateredis:6379")
celery_app = Celery(
"Celery App", broker=REDIS_URL
)
connection_string = "postgresql+pg8000" + os.environ.get(
"DATABASE_URL", "postgres://postgres:d6832cb53032a6819d0c01b707d571c5@dokku-postgres-dataupdatetest:5432/dataupdatetest"
).lstrip("postgresql")
postgres_engine = create_engine(connection_string, poolclass=NullPool)
@celery_app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
# This command invokes a celery task at an interval of every 5 Min. You can change this.
sender.add_periodic_task(300, update_data.s(), name="Update data")
# MY TASK
@celery_app.task
def update_data(if_exists="append"):
filename = os.path.join('assets','testUpdate2023')
print("TEST 1")
indexTest()
print("Filename Check: ", filename)
modtime = modification_date(filename)
modtime = modtime.strftime('%m/%d/%Y %H:%M:%S')
print(modtime)
# Database connect
# Pull Data
# Write to file above
# Print second check (still inside task)
Long winded to get here, but this all works. I can see on the server when I deploy data coming in and the second check containing new data…
However, when I go to the app, the new data does not appear. Per Plotly recommendations, I have referenced both the file and layout as functions in order to reload on page load with no luck. This is a multipage app, so @app and some other stuff might look a little off from a single page - however, this works as expected other than not having the new data.
def serveIndex():
filename = os.path.join('assets','testUpdate2023')
print(os.path.getmtime(filename))
myIndex = open_dir(filename)
print("Last Mod", myIndex.last_modified())
print("Up to date", myIndex.up_to_date())
return myIndex
def layout():
return html.Div([...])
@callback(
[Output(component_id='outTableTrend', component_property='data'),
Output(component_id='outTableTrend', component_property='columns')],
[Input(component_id='dateRange', component_property='startDate'),
Input(component_id='dateRange', component_property='endDate')])
def allCrs(start_date, end_date):
myIndex2 = serveIndex()
# do stuff to index to create desired DF
return data, columns
I still do not get new rows in the app or on the backend logs. Any ideas on what is going on here? I tried setting up gunicorn to reload/restart as well with no luck. The page only reads data from the last deploy even though I can see the updated data and modification times in the tasks.