Hi,
I have a datatable storing the run ID’s of 100s of runs, each of which has ~17000 rows of data and multiple columns. To this end, I have stored all of the data in a database hosted on AWS. Within my dash app, a user can select the run they wish to view and then plot it. Currently, this is done by calling a helper function I’ve made with the following code:
def read_sql_tmpfile(thisquery, db):
with tempfile.TemporaryFile() as tmpfile:
copy_sql = “COPY ({query}) TO STDOUT WITH CSV {head}”.format(
query=thisquery, head=“HEADER”
)
engine = db.engine
conn = engine.raw_connection()
cur = conn.cursor()
cur.copy_expert(copy_sql, tmpfile)
tmpfile.seek(0)
dbdata = pd.read_csv(tmpfile)
cur.close()
conn.close()
return dbdata
where dbdata is a dataframe with the data I want.
I’ve found the loading into a tmpfile and reading into dataframe a quick method to get data from SQL command into a dataframe (happy to be corrected?!), but what i think is quite inefficient is the fact that I am opening and closing new connections each time I load new data. This is largely so i can be sure of the connection being closed each time. Is there a better way to handle this? My temptation is to configure the cursor once at initial load in my application.py file, but are there mechanisms by which i can ensure it is closed? Similarly, is there an alternative method that is more widely used here e.g. sessions? Thanks in advance for any help.