This is my understanding as well. I believe if either 2 or 3 were the case, it would appear in the logs.
Can you check that at least one of your worker processes is monitoring the periodic jobs queue?
It does. We have followed the instructions provided here to add a new service for general RQ jobs and our configuration is almost identical to the one in step 5 (please see below). Emails and tasks in the default queue are executed normally.
Thanks for the details. If you check your docker logs, do you see a message like this from the scheduler service when it starts?
[2022-03-15 17:20:25,452][PID:8][INFO][redash.tasks.schedule] Scheduling e27209059575fcc17c527c47d0957cb21756e551 (cleanup_query_results) with interval 300.
The interval is in seconds. So 300 here means it will run every five minutes.
When it runs you should see a log message from one of the worker services that observes the periodic queue. The message will look like this:
[2022-03-15 17:25:33,910][PID:25][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=e27209059575fcc17c527c47d0957cb21756e551 Running query results clean up (removing maximum of 100 unused results, that are 0 days old or more)
In my case I set it to clean up all query results to make it easy to verify.
I have checked the logs since Feb 21st (that’s when Redash was upgraded to v10) and here are my findings:
If you check your docker logs, do you see a message like this from the scheduler service when it starts?
There are only a few entries with a message like this. My understanding is that I should be expecting to see it every time the task was starting, is that correct?
Show logs
[2022-02-21 15:59:56635][PID:1][INFO][redash.tasks.schedule] Scheduling e27209059575fcc17c527c47d0957cb21756e551 (cleanup_query_results) with interval 300.
[2022-02-22 10:43:58998][PID:1][INFO][redash.tasks.schedule] Scheduling e27209059575fcc17c527c47d0957cb21756e551 (cleanup_query_results) with interval 300.
[2022-03-02 09:12:02255][PID:1][INFO][redash.tasks.schedule] Scheduling e27209059575fcc17c527c47d0957cb21756e551 (cleanup_query_results) with interval 300.
[2022-03-09 14:23:05815][PID:1][INFO][redash.tasks.schedule] Scheduling e27209059575fcc17c527c47d0957cb21756e551 (cleanup_query_results) with interval 300.
[2022-03-11 16:47:31106][PID:1][INFO][redash.tasks.schedule] Scheduling e27209059575fcc17c527c47d0957cb21756e551 (cleanup_query_results) with interval 300.
Do you see any messages like this?
Yes, but not consistently. In the beginning it was scheduled a few times but not sticking to its schedule and then it stopped.
On Mar 2nd we received an alert for high disk space utilization from the Redash database and we started looking into it.
We realized that the cleanup task wasn’t running but we couldn’t understand why, so we restarted all services and tried to clean up manually some old cached query results.
Since then the task was being scheduled normally as expected, up until March 8th when it stopped again (I am not posting all the logs from that time but I can do so if needed).
Since then there were only a couple of entries in the logs. We saw the task being scheduled once after restarting Redash and then again nothing.
Show logs
[2022-02-21 16:04:14,015][PID:229][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=e27209059575fcc17c527c47d0957cb21756e551 Running query results clean up (removing maximum of 100 unused results, that are 7 days old or more)
[2022-02-21 16:04:17,031][PID:229][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=e27209059575fcc17c527c47d0957cb21756e551 Deleted 100 unused query results.
[2022-02-22 10:48:11,001][PID:207][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=e27209059575fcc17c527c47d0957cb21756e551 Running query results clean up (removing maximum of 100 unused results, that are 7 days old or more)
[2022-02-22 10:48:12,699][PID:207][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=e27209059575fcc17c527c47d0957cb21756e551 Deleted 100 unused query results.
[2022-03-02 08:30:53,816][PID:4653][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=325de3b5237b058d20e318b14fbf4bee0aaf2f76 Running query results clean up (removing maximum of 100 unused results, that are 7 days old or more)
[2022-03-02 08:30:54,392][PID:4653][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=325de3b5237b058d20e318b14fbf4bee0aaf2f76 Deleted 100 unused query results.
[...]
[2022-03-08 10:18:46,643][PID:17436][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=e27209059575fcc17c527c47d0957cb21756e551 Running query results clean up (removing maximum of 100 unused results, that are 7 days old or more)
[2022-03-08 10:18:46,948][PID:17436][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=e27209059575fcc17c527c47d0957cb21756e551 Deleted 0 unused query results.
[2022-03-09 14:27:19,547][PID:212][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=e27209059575fcc17c527c47d0957cb21756e551 Running query results clean up (removing maximum of 200 unused results, that are 1 days old or more
[2022-03-11 16:52:08,715][PID:219][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=e27209059575fcc17c527c47d0957cb21756e551 Running query results clean up (removing maximum of 200 unused results, that are 1 days old or more)
[2022-03-11 16:52:11,014][PID:219][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=e27209059575fcc17c527c47d0957cb21756e551 Deleted 200 unused query results.
I have restarted all the services a couple of times with no luck; the task still does not seem to be scheduled
Not quite. This message emits exactly once when the scheduler service starts, i.e when you run docker-compose up -d. If you ran docker-compose restart scheduler you should see the message again. As a debugging step, you should try restarting your scheduler and verifying that the cleanup task registration log appears.
That said, a different message will appear when the cleanup task actually runs. That different message will be emitted not by the scheduler, but by the worker.
Exactly right. This is normal. When the scheduler starts it emits logs to say which tasks it will schedule, and at what interval. The scheduler never executes tasks, it only schedules them. It does not emit a message every time it schedules a task.
So you might ask, “how will I know the task is running if the scheduler doesn’t tell me?”. That’s where the worker logs come in. When a worker pulls a job from the queue, it emits a log message specifying the job id, task name, and a timestamp. So in simple terms what you should expect to to see is:
00:00 scheduler: I’m going to schedule this task to run every five minutes 00:00 worker: I just picked up a task to run, I’m doing that now 00:01 worker: I finished the task, here’s the outcome 00:05 worker: I just picked up a task to run, I’m doing that now 00:06 worker: I finished the task, here’s the outcome 00:10 worker: I just picked up a task to run, I’m doing that now 00:11 worker: I finished the task, here’s the outcome
…and so on.
One thing that may be happening is that the cleanup job is taking too long to execute. What are your settings for the number of results to cleanup and the interval?
the task seems to have both been registered and executed as expected.
Logs below are just to confirm that the task has been registered and has had 2 successful executions; other than that, I confirm that it’s been running consistently every 5 minutes, up until the time I am posting this message. It doesn’t make sense…
Show logs
[2022-03-17 10:43:44,795][pid:1][info][redash.tasks.schedule] scheduling e27209059575fcc17c527c47d0957cb21756e551 (cleanup_query_results) with interval 300.
[2022-03-17 10:43:44,812][pid:32678][info][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=e27209059575fcc17c527c47d0957cb21756e551 running query results clean up (removing maximum of 200 unused results, that are 1 days old or more)
[2022-03-17 10:43:45,985][pid:32678][info][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=e27209059575fcc17c527c47d0957cb21756e551 deleted 200 unused query results.
[2022-03-17 10:48:45,050][PID:360][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=e27209059575fcc17c527c47d0957cb21756e551 Running query results clean up (removing maximum of 200 unused results, that are 1 days old or more)
[2022-03-17 10:48:45,962][PID:360][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=e27209059575fcc17c527c47d0957cb21756e551 Deleted 200 unused query results.
We have set the following values in /opt/redash/env:
Interval hasn’t been changed; I couldn’t find any cleanup related env vars for interval, but from this I am guessing the interval is hardcoded to 5 minutes?
I’m looking forward to adding this breakdown to our documentation. Thanks for your patience while I gathered the info!
You’re right. I didn’t read closely enough (I’m updating the docs for Redash settings right now btw).
It sounds like for the moment your issue is resolved. But if you haven’t changed any settings it seems likely that the issue of database bloat will continue. I wonder if the cause ultimately is that the job is running but…
a) it crashes or hangs midway through
b) it runs normally, but the number of results to clean up is growing faster than the cleanup job can clear it
Hmmm very strange not to see the registration, although it’s possible waiting longer may have shown a result. For consistency, I never run docker-compose up -d on its own to restart services. I run stop first, like this:
$ docker-compose stop && docker-compose up -d
This guarantees that everything stops and is restarted.
I don’t usually run docker-compose restart (without specifying a service first) because this won’t guarantee that environment variable changes are propagated (not sure why, but it’s a docker issue not specific to Redash).
Possible causes:
It could be an insufficient number of workers monitoring the periodic queue. If another periodic job is hogging the workers then the cleanup task might never run. To check for this, observe how many jobs are in the periodic queue when you stop seeing the task run. If the number stays static or increases, there aren’t enough workers.
The worker is crashing before it can emit the log saying it’s picked up the task. To check for this, don’t grep for cleanup_query_results, but instead see if there are signs of a worker thread crashing.
The scheduler stops scheduling the job. To check for this, you can connect directly to the redis instance and pattern search for cleanup_query_results in the redis database.
Depending on which of those is the cause there would be different steps to fix.
Not sure. Keep an eye on it and ping what you find or if you have questions. We’ll get to the bottom of it I’m sure.
Circling back on this as the cleanup_query_results seems to have stopped running again.
Unfortunately we only noticed because we got a notification about the Redash DB running out of storage and realized it wasn’t running for the past month or so.
As before, the task was scheduled successfully after the service was restarted, run once, and then nothing again. Redash has been restarted a couple of times after that to see if the job will run consistently, but I am only noticing the same behavior. Gets scheduled, runs once, then nothing.
Show logs
[2022-08-05 07:11:25,916][PID:1][INFO][redash.tasks.schedule] Scheduling e27209059575fcc17c527c47d0957cb21756e551 (cleanup_query_results) with interval 300.
[2022-08-05 07:15:37,847][PID:212][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=e27209059575fcc17c527c47d0957cb21756e551 Running query results clean up (removing maximum of 500 unused results, that are 1 days old or more)
I can see that job details do correlate with the logs above on schedule and execution. It also seems it was enqueued to run again but there were no signs of it in the logs. Not sure how tasks are registered in Redis but I was also not able to find the specific key again after 15 minutes or so (and of course the cleanup job hasn’t run ever since).
Hi, I am checking to see if there is any followup available? We seem to have a similar issue where the “periodic” queue is scheduled, we have 1 of four workers busy, with 3 idle. However, the query_results do not seem to be getting cleaned up.