Hi ,we are using redash version redash:9.0.0-beta.b42121 using docker-compose and redsah continue to fill the disk despite passing all the environment variables required for clearing the disk, can someone help.
Below are the env varibales we are using
There is an open question as to whether we have an issue in V9 + V10 with maintenance tasks not running on schedule. There is an in-depth discussion of this here. To see whether this affects you, can you examine your docker logs to see if the cleanup job is running at all?
So can you suggest what can be a possible solution for this , or should we upgrade it V10 , also if we need to upgrade , can you share the upgrade steps from beta to stable release.
I am getting below in docker logs:
worker_1 | [2022-05-18 17:09:27,503][PID:3257][INFO][rq.worker] Result is kept for 120 seconds
worker_1 | [2022-05-18 17:09:37,474][PID:17][INFO][rq.worker] periodic: e27209059575fcc17c527c47d0957cb21756e551
worker_1 | [2022-05-18 17:09:37,481][PID:3258][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=e27209059575fcc17c527c47d0957cb21756e551 Running query results clean up (removing maximum of 10000 unused results, that are 2 days old or more)
worker_1 | [2022-05-18 17:09:37,609][PID:3258][INFO][rq.job.redash.tasks.queries.maintenance] job.func_name=redash.tasks.queries.maintenance.cleanup_query_results job.id=e27209059575fcc17c527c47d0957cb21756e551 Deleted 32 unused query results.
scheduled_worker_1 | [2022-05-18 17:17:30,284][PID:2201][INFO][root] Updated 1 queries with result (f7d299a3f28039e90eede2be622316e1).
First, it looks like the cleanup job is working as expected. It’s removing unused results.
Which begs the question: what is growing the disk usage and is that normal? How many queries are you running regularly? Is it the query_results table that grows? What I’m getting at is there are use cases of Redash where you would expect to see the disk space to grow: i.e. if you are running more and larger queries that return many thousands of rows.
While I usually recommend upgrading, I don’t believe that doing so will “fix” this because it’s not clear if we have a problem or not. If you want to upgrade there are detailed instructions on github under our releases section here: Release v10.1.0 · getredash/redash · GitHub
yes , the usage is pretty high but considering the value we set for the below variable, the size of data is pretty unusual!
REDASH_QUERY_RESULTS_CLEANUP_MAX_AGE=2
Not really, since it cleans up unused query results. If you’re actually using all those query results then it’s not going to affect them.
i.e. here’s the job description directly from the source code:
def cleanup_query_results():
"""
Job to cleanup unused query results -- such that no query links to them anymore, and older than
settings.QUERY_RESULTS_CLEANUP_MAX_AGE (a week by default, so it's less likely to be open in someone's browser and be used).
Each time the job deletes only settings.QUERY_RESULTS_CLEANUP_COUNT (100 by default) query results so it won't choke
the database in case of many such results.
"""
retrieved_at doesn’t really make a difference. See my edit above that includes the specification for the cleanup job: if the results are being referenced by an existing query they won’t be deleted no matter how old they are.
So I’ll ask again for the third time: are you actually pulling sizable amounts of data into Redash? Because if you are it’s not surprising that the query results table would be this large.
So I’ll ask again for the third time: are you actually pulling sizable amounts of data into Redash? Because if you are it’s not surprising that the query results table would be this large.
No, we are not fetching such a large data.
will running vaccum full querry_results help ? as i can autovaccum is set to on , but it does not release the disk space ?
I checked the file the table querry_results is writing to on the server using below querry:
select pg_relation_filepath('query_results');
but i can see the file is only 1gb in size , whereas the table size is more than 600 gb !
You need to be specific here. The only thing that makes sense from what you’ve described is that you have either lots of queries that return a small amount of data, or a few queries returning large amounts of data.
If you wrote that you only have five queries and they only return 5 rows of data each, then we could naturally assume there is a problem here. But you’ve written that your Redash usage is high. What does that mean? How many queries? How often do they run? Do many of your queries use parameters? How many results are on the query results table? How many queries are there in your Redash instance?
So far you haven’t given enough detail to determine whether your query_results table is actually a problem. In other words: it doesn’t matter if query_results is a large table. In isolation that can be perfectly normal. What matters is whether the size of the table conflicts with the size you expect it to be. Until we know what to expect, we can’t say the size of the table is an issue. Make sense?
I think before we consider whether there’s a default configuration error with Redash (pretty unlikely) we need to nail down exactly how much data do we expect to see on your query results table. This is not a hard question to answer.