To create my dashboard I need to query a BigQuery table which is currently 170k rows (30.7MB).
Because the query can take some time (15 - 20s), I am running it in a seperate worker script (similiar to the boilerplate example). Both the worker and the dash app are deployed to Heroku (the worker is running on a standard-1x dyno with 512MB RAM).
In the worker script, I’m running the query using the pandas-GBQ library…
query = 'SELECT * FROM [Customer_Data.' + str(clientID) + ']'
df_bq = gbq.read_gbq(query, project_id=project_id, private_key=google_apiKey)
This returns the table as a dataframe. However, in the Heroku logs I’m seeing this error…
2018-04-06T10:59:04.068333+00:00 heroku[worker.1]: Process running mem=675M(131.9%)
2018-04-06T10:59:04.068485+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
…which basically means the query is using too much RAM.
Is it unusual to see a query of this size causing such a spike in RAM? Is it likely to be a bug with the library or am I asking too much from Heroku?
Thanks,