Cannot deploy without downtime because of scheduler process getting error raise ValueError("There's already an active RQ scheduler")

Issue Summary

Cannot deploy without downtime because of scheduler process getting error.

We are hosting redash on ourselves with ECS of AWS. Architecture of infrastructure is almost same as what I posted there.

When a new task of ECS get started to run while an existing task is running, worker process which is introduced newly in v9-beta got an error and quit.

Guessing with the message raise ValueError("There's already an active RQ scheduler") ValueError: There's already an active RQ scheduler|, a newly running worker detects existing one which is not desirable to achieve deployment without downtime.

If I stop the existing redash process completely, new redash process with new task definition of ECS can be started but it turned out significant downtime of redash.

Is there any clue to avoid this error?

Technical details:

  • Redash Version:v9 beta
  • How did you install Redash:docker-compose with ECS

Here is an error log scheduler process shows.

|2021-01-16T19:45:22.192+09:00|[2021-01-16 10:45:22,192][PID:1][DEBUG][redash.destinations] Registering Mattermost (mattermost) destinations.|
|—|---|
|2021-01-16T19:45:22.193+09:00|[2021-01-16 10:45:22,193][PID:1][DEBUG][redash.destinations] Registering ChatWork (chatwork) destinations.|
|2021-01-16T19:45:22.200+09:00|[2021-01-16 10:45:22,200][PID:1][DEBUG][redash.destinations] Registering PagerDuty (pagerduty) destinations.|
|2021-01-16T19:45:22.201+09:00|[2021-01-16 10:45:22,201][PID:1][DEBUG][redash.destinations] Registering Google Hangouts Chat (hangouts_chat) destinations.|
|2021-01-16T19:45:23.856+09:00|[2021-01-16 10:45:23,856][PID:1][INFO][rq_scheduler.scheduler] Registering birth|
|2021-01-16T19:45:23.881+09:00|Traceback (most recent call last):expressionless:
|2021-01-16T19:45:23.881+09:00|File “/app/manage.py”, line 9, in |
|2021-01-16T19:45:23.881+09:00|manager()|
|2021-01-16T19:45:23.881+09:00|File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 722, in call|
|2021-01-16T19:45:23.881+09:00|return self.main(*args, **kwargs)|
|2021-01-16T19:45:23.881+09:00|File “/usr/local/lib/python3.7/site-packages/flask/cli.py”, line 586, in main|
|2021-01-16T19:45:23.881+09:00|return super(FlaskGroup, self).main(*args, **kwargs)|
|2021-01-16T19:45:23.881+09:00|File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 697, in main|
|2021-01-16T19:45:23.881+09:00|rv = self.invoke(ctx)|
|2021-01-16T19:45:23.881+09:00|File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 1066, in invoke|
|2021-01-16T19:45:23.881+09:00|return _process_result(sub_ctx.command.invoke(sub_ctx))|
|2021-01-16T19:45:23.881+09:00|File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 1066, in invoke|
|2021-01-16T19:45:23.881+09:00|return _process_result(sub_ctx.command.invoke(sub_ctx))|
|2021-01-16T19:45:23.881+09:00|File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 895, in invoke|
|2021-01-16T19:45:23.881+09:00|return ctx.invoke(self.callback, **ctx.params)|
|2021-01-16T19:45:23.881+09:00|File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 535, in invoke|
|2021-01-16T19:45:23.881+09:00|return callback(*args, **kwargs)|
|2021-01-16T19:45:23.881+09:00|File “/usr/local/lib/python3.7/site-packages/click/decorators.py”, line 17, in new_func|
|2021-01-16T19:45:23.881+09:00|return f(get_current_context(), *args, **kwargs)|
|2021-01-16T19:45:23.881+09:00|File “/usr/local/lib/python3.7/site-packages/flask/cli.py”, line 426, in decorator|
|2021-01-16T19:45:23.881+09:00|return __ctx.invoke(f, *args, **kwargs)|
|2021-01-16T19:45:23.881+09:00|File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 535, in invoke|
|2021-01-16T19:45:23.881+09:00|return callback(*args, **kwargs)|
|2021-01-16T19:45:23.881+09:00|File “/app/redash/cli/rq.py”, line 31, in scheduler|
|2021-01-16T19:45:23.881+09:00|rq_scheduler.run()|
|2021-01-16T19:45:23.881+09:00|File “/usr/local/lib/python3.7/site-packages/rq_scheduler/scheduler.py”, line 404, in run|
|2021-01-16T19:45:23.881+09:00|self.register_birth()|
|2021-01-16T19:45:23.881+09:00|File “/usr/local/lib/python3.7/site-packages/rq_scheduler/scheduler.py”, line 46, in register_birth|
|2021-01-16T19:45:23.881+09:00|raise ValueError(“There’s already an active RQ scheduler”)|
|2021-01-16T19:45:23.881+09:00|ValueError: There’s already an active RQ scheduler|

our docker-compose.yml

version: "2"
x-redash-service: &redash-service
  image: redash/redash:9.0.0-beta.b42121
services:
  server:
    <<: *redash-service
    command: "server"
    ports:
      - "0:5000"
    environment:
      REDASH_WEB_WORKERS: 4
    env_file: .env
    logging:
      driver: awslogs
      options:
        awslogs-region: $AWS_REGION
        awslogs-group: $AWS_LOG_GROUP
        awslogs-stream-prefix: redash-server
  create_db:
    <<: *redash-service
    command: "create_db"
    env_file: .env
    logging:
      driver: awslogs
      options:
        awslogs-region: $AWS_REGION
        awslogs-group: ecs-redash
        awslogs-stream-prefix: redash-createdb
  scheduler:
    <<: *redash-service
    command: scheduler
    env_file: .env
    logging:
      driver: awslogs
      options:
        awslogs-region: $AWS_REGION
        awslogs-group: $AWS_LOG_GROUP
        awslogs-stream-prefix: redash-server
  scheduled_worker:
    <<: *redash-service
    command: worker
    environment:
      QUEUES: "scheduled_queries,schemas"
      WORKERS_COUNT: 1
    env_file: .env
    logging:
      driver: awslogs
      options:
        awslogs-region: $AWS_REGION
        awslogs-group: $AWS_LOG_GROUP
        awslogs-stream-prefix: redash-server
  adhoc_worker:
    <<: *redash-service
    command: worker
    env_file: .env
    environment:
      QUEUES: "queries"
      WORKERS_COUNT: 2
    logging:
      driver: awslogs
      options:
        awslogs-region: $AWS_REGION
        awslogs-group: $AWS_LOG_GROUP
        awslogs-stream-prefix: redash-server
  worker:
    <<: *redash-service
    command: worker
    env_file: .env
    environment:
      QUEUES: "periodic emails default"
      WORKERS_COUNT: 1
    logging:
      driver: awslogs
      options:
        awslogs-region: $AWS_REGION
        awslogs-group: $AWS_LOG_GROUP
        awslogs-stream-prefix: redash-server

Best practice here is to run the scheduler in a separate container so you can spin down the old one and then enable the new one afterward. Since this is only the scheduler service any user-facing “downtime” would be minimal. Ad-hoc query execution, navigating the interface etc wouldn’t be affected.

Thank you so much for your reply. Your solution seems make sense so different container could have different lifecycle as you describe.
Just a side note, I imagined RQ scheduler might have an option to allow multiple workers running, as a newbie to Python ecosystem.

In terms of ECS / docker-compose.yml, I haven’t found a clear way to segregate the worker container from the rest of containers into another ‘task definition’ (ECS term) in a ECS ‘service’. I know it is an issue of ECS, rather than Redash. Will give it a try when it is necessary to update redash without downtime.