Worker architecture question

Following on from :point_up:this GH issue, I’m trying to setup a deployment with specific workers servicing specific data sources. Any help would be appreciated

By default every adhoc query (invoked by a user) is using the queries Celery queue while every scheduled queue (invoked by the scheduler) is using the scheduled_queries queue.

Celery allows for setting which queue(s) each worker is servicing. Our default production setup is already using a separate worker for adhoc queue and scheduled queue:

(note that QUEUES env var)

While it’s not exposed in the UI, each data source has a setting for which queue to use for adhoc queries and which to use for scheduled queries. This is defined by the value of the queue_name and scheduled_queue_name value in their database row.

So to direct a specific data source to use a separate worker:

  1. Start another worker configured to listen on a specific queue.
  2. Update the queue_name and scheduled_queue_name values of the data source database row.

We’re now in the progress of switching to RQ instead of Celery, but it will use a similar concept of queues.

Btw, why do you need it to be serviced by a different worker?

Thanks for the reply!

The short version of the story is that due to complex and somewhat questionable corporate firewall policies, I can’t access all data sources I want from a single place in the network. I can however run multiple workers who can access the same central Postgres instance.

Will try this afternoon and report back if there are any issues

1 Like

Oh, this makes perfect sense and should work as long as the workers have access to Postgres & Redis.

Worked :partying_face:

Marked as solution for the next person who comes along - thanks for the help!