Hi team,
Currently we are facing this issue of queries getting stuck in queued state for even more than 3 hours. Initially we thought that the issue was with the number of workers we have allotted to serve the queries. Recently we have also increased the number of ad_hoc query workers but the problem still seems to persisist. Not able to find exact RCA for same.
Current setup for workers is as follows : environment: PYTHONUNBUFFERED: 0 REDASH_LOG_LEVEL: "INFO" REDASH_REDIS_URL: "redis://redis:6379/0" REDASH_DATABASE_URL: "postgresql://postgres@postgres/postgres" QUEUES: "queries,scheduled_queries,celery" WORKERS_COUNT: 12
While for the ad_hoc queries is as follows : server: <<: *redash-service command: server ports: - "5000:5000" environment: REDASH_WEB_WORKERS: 4 scheduler: <<: *redash-service command: scheduler environment: QUEUES: "celery" WORKERS_COUNT: 1
Currently we have around 568 users using redash to query over snowflake. What exactly is going wrong?
A summary of the issue and the browser/OS environment in which it occurs.
Yes this happens with ad-hoc queries only. Generally everything seems working fine but as the traffic increases. The number of queries inside the queries queue keep on increasing thus leading to all queries in queued state.
We had also flushed the query queue using flushall in redis but again all of the queries were getting queued. [Question]: How can one check the scheduler and decrease the scheduling intervals?(If that can be the issue)
Initially we have increased ad-hoc workers count frrom 2 → 5. Here by increasing the worker count which worker are you referring exactly ? scheduleradhoc_workerRedash_Web_Workers . And what exactly is the impact of increasing the following on performance.
Workers are the processes that execute your queries. If you have many queries waiting in queue that is because no worker is popping them from the queue. In most cases it’s because you have long-running queries that hog the workers. After those long-running queries finish the worker is freed to take the next query and execute it.
Increasing the number of workers can potentially help. However if you have N long-running queries at a given time then you will need at least N + 1 workers before you’d notice a difference in your ad-hoc executions.
Typically for long-running queries you would prefer to use a scheduled execution because these have their own queue and workers. That way long-running background tasks don’t affect performance of adhoc queries.
@k4s1m Thanks for the useful information.
I had a few major questions related to it which are : Q1 Are you recommending to increase the number of scheduled workers ? Q2 Does increasing the number of scheduled query workers automatically starts serving to long running queries instead of ad_hoc queries? Q3 How can one detect the number of long-running queries.
Given the above points does long running query doesnot flush when we flush other queries from redis. Currently we are automatically killing queries from the snowflake end if the execution time exceeds 5 minutes.
I have experienced this too. Queries go into a black box with no resolution or logs.
Try changing the query a bit and rerunning. Its like the hash that is generated gets stuck somewhere.
@k4s1m , Old thread but I’ve come to this due to facing the same issue. I added the REDASH_ADHOC_QUERY_TIME_LIMIT but doesn’t seem to be killing the adhoc queries. We’re on 8.0.0.b32245. Docker setup.
Restarting everything works but that’s not feasible since issue has been happening quite often lately. Any suggestions?