Is there any document/case-study/analysis describing load, hardware/infra requirements and hence cost of self-hosting redash ?
Interesting question. Haven’t seen anything like that, but it would be good info.
As a data point for the low end:
- For the sqlitebrowser.org OSS project, we have a redash server set up to run and publicly display just three queries (viewable here). The queries automatically refresh every 10 minutes.
That server is a super cheapo virtual machine with 2GB of RAM and 2 x86_64 cpu cores, costing €2.99 per month.
It’s a tight fit, with the server not really having much ram free - ~300MB or so, according to Zabbix (our monitoring software).
Although it’s “fine” for this specific use case, if I were putting together a Redash server for multiple people to use, I’d throw a lot more ram at it, and probably more cpu cores.
As above, I don’t think I’ve seen anything specifically covering this, but to give some guidance;
We currently have 12x Dashboards, 52x Queries, across 8 data sources, most of them refreshing every minute or every 5 minutes. Only 2 or 3 concurrent users though (if that)
This is all currently running on an oldish thinkpad laptop (albeit with an old i7 + 16gb ram), though with a view to shifting it to a hosted plan at some point.
we anticipate to have atleast 200 users, likely firing 20k queries a day, some of them might
be adhoc like last 15 mins or last 30 mins… some might be scheduled ones and
querying for days or weeks worth of data. There are quite a few components in redash, redis, workers, database-postgres …
Sounds like you’ll need to do some initial R&D / experimentation.
Haven’t tried spreading things over multiple servers yet, but it seems like the underlying components are designed so they’d fan out ok.
It uses a queueing mechanism that can handle multiple workers, so you’d be able to try things out and measure relevant performance (eg median latency for specific long running reports, adhoc queries, etc) with different resource levels (eg 10 workers of X config, 10 workers of Y config, 20 workers of X config, and so on).
Should be able to determine useful guidelines with that approach. Could take a few weeks of effort, but that’s a reasonable thing given a likely substantial deployment with hundreds of users.
We only documented the minimum requirements: 2GB of RAM and a modest CPU. As your usage grows, what really impacts the hardware needs is number of processes you use to serve API requests (grows slowly) and number of workers to run your queries.
It’s possible to have 10 users generate more workload than 200 users. All depends on the usage pattern.
The good news are that, as it was mentioned, every component in Redash can be scaled out easily on its own. You can scale the number of workers, API servers, etc independently. And you can either deploy them all on a single machine or multiple ones.
I would start with a simple deployment, see where the bottlenecks are and scale accordingly.
If you know you will need to scale, to make it easier, I would put the Redis & PostgreSQL servers that serve Redash on a separate machine. Preferably just use something like RDS for PostgreSQL (or a similar offering from the other clouds).