Query stuck in waiting in kubernetes


#1

Tried running a query on mongodb(Test connection action while adding data source succeeded). However, it remains in waiting queue forever.

Below is the configuration being used:
Redash is deployed on kubernetes AWS EC2.
Redash version: 5.0.1.b4850
Postgress version: 9.6.10
Redis: 4.0.11

Using 4 celery workers. Redash-Worker via ps -ef:

/usr/bin/python /usr/local/bin/celery worker --app=redash.worker --beat -c4 -Qqueries,scheduled_queries,celery -linfo --maxtasksperchild=10 -Ofair

Redash-Web via ps -ef:

/usr/bin/python /usr/local/bin/gunicorn -b 0.0.0.0:5000 --name redash -w2 redash.wsgi:app

LOGS

redash-web:

[2018-10-15 12:21:04,121][PID:11][DEBUG][metrics] table=organizations query=select duration=1.96
[2018-10-15 12:21:04,122][PID:11][DEBUG][root] Current organization: eXtendTech (1) (slug: default)
[2018-10-15 12:21:04,124][PID:11][DEBUG][metrics] table=users query=select duration=1.34
[2018-10-15 12:21:04,125][PID:11][INFO][metrics] method=GET path=/api/jobs/3ed05cea-3834-4453-a8c8-cc251f901902 endpoint=job status=200 content_type=application/json content_length=123 duration=7.17 query_count=2 query_duration=3.29
[2018-10-15 12:21:04,842][PID:11][DEBUG][metrics] table=organizations query=select duration=2.01
[2018-10-15 12:21:04,843][PID:11][DEBUG][root] Current organization: eXtendTech (1) (slug: default)
[2018-10-15 12:21:04,845][PID:11][DEBUG][metrics] table=users query=select duration=1.41
[2018-10-15 12:21:04,847][PID:11][DEBUG][metrics] table=groups query=select duration=1.15
[2018-10-15 12:21:04,848][PID:11][INFO][metrics] method=GET path=/api/admin/queries/tasks endpoint=redash_queries_tasks status=200 content_type=application/json content_length=401 duration=8.95 query_count=3 query_duration=4.57
[2018-10-15 12:21:08,098][PID:14][DEBUG][metrics] table=organizations query=select duration=1.99
[2018-10-15 12:21:08,098][PID:14][DEBUG][root] Current organization: eXtendTech (1) (slug: default)
[2018-10-15 12:21:08,100][PID:14][DEBUG][metrics] table=users query=select duration=1.39
[2018-10-15 12:21:08,101][PID:14][INFO][metrics] method=GET path=/api/jobs/3ed05cea-3834-4453-a8c8-cc251f901902 endpoint=job status=200 content_type=application/json content_length=123 duration=6.53 query_count=2 query_duration=3.37
[2018-10-15 12:21:09,838][PID:14][DEBUG][metrics] table=organizations query=select duration=1.98
[2018-10-15 12:21:09,838][PID:14][DEBUG][root] Current organization: eXtendTech (1) (slug: default)
[2018-10-15 12:21:09,841][PID:14][DEBUG][metrics] table=users query=select duration=1.38
[2018-10-15 12:21:09,843][PID:14][DEBUG][metrics] table=groups query=select duration=1.17
[2018-10-15 12:21:09,844][PID:14][INFO][metrics] method=GET path=/api/admin/queries/tasks endpoint=redash_queries_tasks status=200 content_type=application/json content_length=401 duration=9.35 query_count=3 query_duration=4.53
[2018-10-15 12:21:12,097][PID:11][DEBUG][metrics] table=organizations query=select duration=1.92
[2018-10-15 12:21:12,097][PID:11][DEBUG][root] Current organization: eXtendTech (1) (slug: default)
[2018-10-15 12:21:12,099][PID:11][DEBUG][metrics] table=users query=select duration=1.43
[2018-10-15 12:21:12,100][PID:11][INFO][metrics] method=GET path=/api/jobs/3ed05cea-3834-4453-a8c8-cc251f901902 endpoint=job status=200 content_type=application/json content_length=123 duration=6.68 query_count=2 query_duration=3.35

redash-worker:

[2018-10-15 12:20:08,841][PID:1][INFO][MainProcess] Received task: redash.tasks.refresh_queries[a295f98a-aa63-4740-82ed-c6f8420ce3c8]                                                                              
[2018-10-15 12:20:08,842][PID:35][INFO][Worker-9] task_name=redash.tasks.refresh_queries task_id=a295f98a-aa63-4740-82ed-c6f8420ce3c8 Refreshing queries...                                                        
[2018-10-15 12:20:08,847][PID:35][INFO][Worker-9] task_name=redash.tasks.refresh_queries task_id=a295f98a-aa63-4740-82ed-c6f8420ce3c8 Done refreshing queries. Found 0 outdated queries: []                        
[2018-10-15 12:20:08,849][PID:1][INFO][MainProcess] Task redash.tasks.refresh_queries[a295f98a-aa63-4740-82ed-c6f8420ce3c8] succeeded in 0.00767686600011s: None                                                   
[2018-10-15 12:20:38,860][PID:23][INFO][Beat] Scheduler: Sending due task refresh_queries (redash.tasks.refresh_queries)                                                                                           
[2018-10-15 12:20:38,863][PID:1][INFO][MainProcess] Received task: redash.tasks.refresh_queries[c88a5f5b-b72a-426f-9623-6e16c8949875]                                                                              
[2018-10-15 12:20:38,864][PID:26][INFO][Worker-6] task_name=redash.tasks.refresh_queries task_id=c88a5f5b-b72a-426f-9623-6e16c8949875 Refreshing queries...                                                        
[2018-10-15 12:20:38,870][PID:26][INFO][Worker-6] task_name=redash.tasks.refresh_queries task_id=c88a5f5b-b72a-426f-9623-6e16c8949875 Done refreshing queries. Found 0 outdated queries: []                        
[2018-10-15 12:20:38,872][PID:1][INFO][MainProcess] Task redash.tasks.refresh_queries[c88a5f5b-b72a-426f-9623-6e16c8949875] succeeded in 0.00825275199986s: None                                                   
[2018-10-15 12:21:08,861][PID:23][INFO][Beat] Scheduler: Sending due task refresh_queries (redash.tasks.refresh_queries)                                                                                           
[2018-10-15 12:21:08,863][PID:1][INFO][MainProcess] Received task: redash.tasks.refresh_queries[ec68fe5c-65d7-4906-aa5a-250e81999103]                                                                              
[2018-10-15 12:21:08,864][PID:29][INFO][Worker-7] task_name=redash.tasks.refresh_queries task_id=ec68fe5c-65d7-4906-aa5a-250e81999103 Refreshing queries...                                                        
[2018-10-15 12:21:08,869][PID:29][INFO][Worker-7] task_name=redash.tasks.refresh_queries task_id=ec68fe5c-65d7-4906-aa5a-250e81999103 Done refreshing queries. Found 0 outdated queries: []                        
[2018-10-15 12:21:08,871][PID:1][INFO][MainProcess] Task redash.tasks.refresh_queries[ec68fe5c-65d7-4906-aa5a-250e81999103] succeeded in 0.00751525400028s: None

I have tried following as a workaround, but none of them worked:

  • Updating CELERYD_PREFETCH_MULTIPLIER to 4 as describe in PR #1783.
  • Updating redash/worker.py by using high timeout for asynpool.PROC_ALIVE_TIMEOUT as mentioned in comment