Following on from this GH issue, I’m trying to setup a deployment with specific workers servicing specific data sources. Any help would be appreciated
By default every adhoc query (invoked by a user) is using the queries
Celery queue while every scheduled queue (invoked by the scheduler) is using the scheduled_queries
queue.
Celery allows for setting which queue(s) each worker is servicing. Our default production setup is already using a separate worker for adhoc queue and scheduled queue:
(note that QUEUES
env var)
While it’s not exposed in the UI, each data source has a setting for which queue to use for adhoc queries and which to use for scheduled queries. This is defined by the value of the queue_name
and scheduled_queue_name
value in their database row.
So to direct a specific data source to use a separate worker:
- Start another worker configured to listen on a specific queue.
- Update the
queue_name
andscheduled_queue_name
values of the data source database row.
We’re now in the progress of switching to RQ instead of Celery, but it will use a similar concept of queues.
Btw, why do you need it to be serviced by a different worker?
Thanks for the reply!
The short version of the story is that due to complex and somewhat questionable corporate firewall policies, I can’t access all data sources I want from a single place in the network. I can however run multiple workers who can access the same central Postgres instance.
Will try this afternoon and report back if there are any issues
Oh, this makes perfect sense and should work as long as the workers have access to Postgres & Redis.
Worked
Marked as solution for the next person who comes along - thanks for the help!