Which database hosting service/solution do you use?

Which database hosting service/solution do you use for your dash apps. I’ve gotten to a point where my database is just a hair to big for feasible hardcoding (~300mb).

I’ve looked at AWS Aura/RDS, Heroku, and some others and still feel confused with how to calculate the pricing. I also feel that these services feel like overkill for what I need so I thought I would reach out to other dash app creators to see what they are doing.

Once again, my database is about 300mb and will be making only read queries.

Thanks for your help!

2 Likes

Not one to spam the community, but I would like to try bumping this one time just in hopes that someone who has a solution see’s this.

Thanks!

1 Like

i denormalize my data and run elastic search orks decently so far

For Dash apps I’ve built through my job, I’ve deployed on Amazon EC2 and RDS (with some of my company’s general infrastructure sitting in between that’s not so relevant here).

On my own time I’ve tried AWS ElasticBeanstalk, which worked fine, but I really had a good experience with Heroku. Very easy to configure and add Databases etc. Seems like theres quite a few options depending on expected usage so I think you could probably get a decent price. The free tiers are also usually more than adequate for development.

Yeah to me it feels like a hosted solution is definitely too intense for 300mb or so. Generally that amount of data would be nearly free. When you say “hardcode” do you mean a static DB file, a file directory, etc., or do you mean configuring a DB running on your VM?

I have a few $10/mo VMs running a few different database servers. I like Azure and DigitalOcean. I often use MongoDB for flexible things (especially as a key-value store with simplekv https://github.com/mbr/simplekv) though I don’t like the 16MB document limit - it’s hard to store data blobs. For more structured things I use PostgresQL or even SQLite depending on how quickly things will be accessed.

I have one implementation of a pickled key-value store with each individual file being on Amazon S3. I don’t recommend this.

I have started use Elasticsearch to replace MongoDB. Its basic Python API is very good and simple, its search is crazy fast, and the document size limit is 2GB which is great for larger data files as well as smaller documents (though I still have to use Parquet files and Apache Plasma for the largest files). I recommend it to everyone these days. Like MongoDB, you can call an item’s value by its _id which can basically turn it into a key-value store but with search.

It is a static sqlite db file.

$10/mo did seem like the standard, but I felt that it was overkill, especially since I would be scraping the surface in its usage. If I just wanted to run my own server that did nothing but hold the sqlite db, would there be anything that is a bit more practical. I was hoping it wouldn’t come out to more than a dollar or so each month.

Thanks for the pointer. I also looked into Heroku, but since my table has way more than 10k rows it puts me into the $9/month category. I might be considered a cheap-o here, but that just seems unnecessary for what this is doing.

I’ve deployed on Amazon EC2 and RDS

What is the difference here? I get so lost in all the options. At a fundamental level, I get that the former is just a server that can hold your database, in addition to anything else you want, but I get lost on what the latter is trying to solve and what the tradeoffs would be.

Hi,

if it’s “just” 300MB and “just” a simple DB maybe switcheing to mongoDB might be an option.
They have their hosted service called Atlas. (Actually runs in GAE, AWS or Azure)
And have a free plan for a simple 3 node cluster with max 500 MB.

I use it for test DBs and had no problems. Easy to use with python.
Check the blog post: https://www.mongodb.com/blog/post/new-to-mongodb-atlas--get-started-with-free-database-tier-on-microsoft-azure

2 Likes

Thanks for the pointer. I also looked into Heroku, but since my table has way more than 10k rows it puts me into the $9/month category. I might be considered a cheap-o here, but that just seems unnecessary for what this is doing.

Fair enough. I’ve pretty limited experience of other options I’m afraid so wouldn’t want to speak authoritatively on them.

What is the difference here?

This is kind of at the edge of my understanding, but while EC2 is a fairly general service that just provides you with compute (i.e. servers that you more or less have full control over), RDS is a managed database service that runs on the same servers you can provision as EC2 instances. The primary difference therefore being that Amazon manages the installation and configuration etc. of the database, and provides options for automatic backups, migrations etc. whereas if you were only using EC2 instances you would have to take care of all that sort of thing yourself.

This looks like a good option. I’ll compare along with DynamoDB which is also NoSQL and has a 25GB free tier.

Thanks

You can use a MongoDB instance to store the relevant data. You can deploy it on AWS EC2 instance.

I host my apps in Heroku (trying not to preload but use as much python functions as possible). However, if the memory usage escalate then I use my own job’s infrastructure (basically one of my servers internally) to host my app and linked to a .to or .whatever free option I find.

I have updated this post with an improved method and thought I would share

1. NoSQL object storage via Google Firestore (i.e the old solution):

Thank you to everyone for your replies! While I wish I could respond and thank everyone, I also wouldn’t want to make this thread unwieldy. All in all, the solution I decided to go with was to use a NoSQL key value storage service. Using any type relational DBaaS, or a VM instance with my own db running, all started out at around 10 USD/month, and even then my needs barely scratched the surface of the quotas.

For those interested in the finer details:

I went with Firestore which is provided by Google Cloud Platform. I initially looked into AWS’s DyanoDB but I found it all so overwhelming when it came to understanding pricing. With GCP things are relatively straightforward in that arena – there is no need to setup any billing account to try things out in the free tier (the API will simply throw an exception if you max out your quotas) and you have the option to easily set limits in place when you do add billing info. That said, the free tier is pretty generous. I don’t know how competitive the pricing was, but I feel that any difference would be immaterial for my needs anyways.

As for normalizing that for document/collection storage, this was my biggest appeal since there are clever ways to design the objects for optimum storage. In the case of GCP, you have a 1MiB per document limit and the are very transparent with how much bytes each data type takes up. For example, I was able to bring the total writes of a ~1.2M row timeseries database to just 805 entries. Very different compared to the traditional relational SQL paradigm.

I don’t advocate one over the other, but for my purposes, this was the ideal option.

2. Cache memoization with timeout=0

It may help if I add a little more context on my deployment workflow which is that I have no control of the deployment server (unfortunately). Instead, I have to send my source code in version control to the vendors we have. This makes things a bit tricky since I don’t have direct control of the environment to store relatively simple datasets and sending it in source is not feasible.

Since I don’t need to do any CRUD operations on the data, it doesn’t need to be a database, let alone have any of the overhead of a vm service. Instead, I created a function that pulls the data from its online source and returns it as a dataframe. The key to this solution though is that this function also get’s cached, but the timeout is explicitly set to 0. This ensures that the data doesn’t have to be re-pulled again, that is, at least until the cache is cleared. I got my inspiration from this community post so do check that out if you need more information about using cache.

On a related note to my deployment workflow, it would be really nice if I could have someone from @plotly reach out about the premium support plans. I’ve inquired a couple times over the past month or so and have yet heard back from anybody. I think plotly is doing a great job with open source development, but I really do need some premium support with deployment option, especially the dash portal framework.

Let me know if you stumble across this and have any questions!

2 Likes