Which database hosting service/solution do you use?

mbkupfer · September 17, 2019, 5:18pm

I have updated this post with an improved method and thought I would share

1. NoSQL object storage via Google Firestore (i.e the old solution):

Thank you to everyone for your replies! While I wish I could respond and thank everyone, I also wouldn’t want to make this thread unwieldy. All in all, the solution I decided to go with was to use a NoSQL key value storage service. Using any type relational DBaaS, or a VM instance with my own db running, all started out at around 10 USD/month, and even then my needs barely scratched the surface of the quotas.

For those interested in the finer details:

I went with Firestore which is provided by Google Cloud Platform. I initially looked into AWS’s DyanoDB but I found it all so overwhelming when it came to understanding pricing. With GCP things are relatively straightforward in that arena – there is no need to setup any billing account to try things out in the free tier (the API will simply throw an exception if you max out your quotas) and you have the option to easily set limits in place when you do add billing info. That said, the free tier is pretty generous. I don’t know how competitive the pricing was, but I feel that any difference would be immaterial for my needs anyways.

As for normalizing that for document/collection storage, this was my biggest appeal since there are clever ways to design the objects for optimum storage. In the case of GCP, you have a 1MiB per document limit and the are very transparent with how much bytes each data type takes up. For example, I was able to bring the total writes of a ~1.2M row timeseries database to just 805 entries. Very different compared to the traditional relational SQL paradigm.

I don’t advocate one over the other, but for my purposes, this was the ideal option.

2. Cache memoization with timeout=0

It may help if I add a little more context on my deployment workflow which is that I have no control of the deployment server (unfortunately). Instead, I have to send my source code in version control to the vendors we have. This makes things a bit tricky since I don’t have direct control of the environment to store relatively simple datasets and sending it in source is not feasible.

Since I don’t need to do any CRUD operations on the data, it doesn’t need to be a database, let alone have any of the overhead of a vm service. Instead, I created a function that pulls the data from its online source and returns it as a dataframe. The key to this solution though is that this function also get’s cached, but the timeout is explicitly set to 0. This ensures that the data doesn’t have to be re-pulled again, that is, at least until the cache is cleared. I got my inspiration from this community post so do check that out if you need more information about using cache.

On a related note to my deployment workflow, it would be really nice if I could have someone from @plotly reach out about the premium support plans. I’ve inquired a couple times over the past month or so and have yet heard back from anybody. I think plotly is doing a great job with open source development, but I really do need some premium support with deployment option, especially the dash portal framework.

Let me know if you stumble across this and have any questions!

Topic		Replies	Views
Best data querying tool to use in my case? Feather files too large for heroku Dash Python	0	495	June 21, 2018
Cheapest way to deploy Dash Python	3	918	July 25, 2021
Dash alternative for now obsolete free streaming to Plotly Cloud Dash Python	4	1792	May 23, 2018
Tips on implementation of large Dash app Dash Python	4	1786	June 8, 2020
Dash Deployment - saving user interaction data Dash Python	1	565	October 2, 2019

Which database hosting service/solution do you use?

1. NoSQL object storage via Google Firestore (i.e the old solution):

2. Cache memoization with timeout=0

Related topics