We are increasing our usage of Redash and quickly encountering availability concerns. I’m seeking official confirmation from @arikfr / @jesse / et al. on this topic: is it safe to run multiple instances of the Redash webapp concurrently, in order to achieve high availability?
This question has been asked several times, over several years, but with no official response.
(note that as a new user, I’m prohibited from posting more than 2 links - there are others)
I’ll elaborate on our configuration.
We run Redash as a collection of ECS Services in AWS with an RDS database backend.
The ECS Services are all mono-services i.e. we run exactly 1 instance of each:
redash adhoc worker
redash scheduled worker
We don’t have any concerns or issues with the scheduler or worker services. They are available enough for our needs, and not user-facing.
Our concern is with the ECS (mono-) Service which runs both the redash and nginx Containers for us. We want to understand whether we can safely run multiple instances of these Containers (either separately or together) in order to achieve high availability. Our concern is that, if Redash is not designed to handle concurrent write activity, we will have data corruption. Thus we’re seeking official confirmation that Redash is designed to run safely in such a multi-instance mode.
As an AWS-managed service, the database is already highly available. We’re seeking to make Redash itself also highly available as our usage increases. Any confirmation and guidance you can offer will be greatly appreciated.
Yes. It is safe to run multiple webservers and nginx instances. Redash is designed such that this is safe for concurrent write activity as it uses locks on the database tables. Prior to its EOL, hosted Redash had (I think) three webservers and three nginx instances without issue.
Looking forward: we will prepare and release a full document about high-availability Redash including a tutorial for deploying in Kubernetes.
Thanks for making another post about this. If you run into issues with HA deployment please post and discuss about them here. It will help us in forming the official document on the subject.
Thank you, @jesse ! If you happen to remember, please post that doc here once it’s available. Our group will review for sure.
This is great news to have this confirmation. Currently we’re having issues where a single user can easily crash Redash (just by opening 10-15 queries in different tabs of the browser). We need to fix that issue first because our Redash just isn’t stable at the moment. But knowing we can run multiples concurrently could help to reduce some of the user-facing errors that result from an instance being taken offline for any reason.
quick side thought on concurrent writes…locking DB tables is one thing, but consider this scenario…
User A pulls up a certain Query in Redash. A few seconds later User B pulls up the same query. Each user makes independent, non-conflicting changes. User A saves changes, then a few seconds later User B saves.
Will the resultant Query contain User A’s changes? It sounds like they will be overwritten. DB table locks don’t help here and this situation would result in end-user data loss. i.e. User A’s work has been lost.
This would not be possible because only the query owner can edit the query. If experimental multi-owner support is enabled and both users own the query then User A’s change would be lost. This is really a front-end concern though. It makes sense to add a check which checks that the query hash from the API has not changed since the page was loaded.