I faced a production issue while using redash
it could be that I didn’t know all the information about how redash Is working
also I didn’t find a clear documentation for the tool
Why redash is saving the data in Postgres data base if redash is just cashing the data
Redash caches your latest query result for each query. If you pull large datasets into Redash then its internal database will grow.
Redash is built to plot aggregated data sets. That means your query results should be ~20mb for optimal performance. As your query results grow the performance will deteriorate in two ways:
Your query runners may not have sufficient resources to hold the query result in RAM before it is cached in the internal database.
Your browser can choke if you bring many megabytes of data at once.
It’s not clear why your hard disk is full. Did you research what is taking the space?
it was not clear for me when redash is storing the data itself not the query on the hard disk
as for me the hard disk was locked suddenly because of a huge query
on the filter part in redash documentation it’s mentioned that filters should be used for smaller data sets only
Seems like there’s a misunderstanding in this thread about what “filtering” means.
Filtering in general is any SQL statement preceded by a WHERE clause. These are required when working with large data sets in Redash. You’ll crash the app if you select * from a large data set without any filtering applied.
Query Filters are a feature of Redash that require you to alias your columns with ::filter. If you have a query that returns 3k results, a Query Filter will work fine. If you load a large data set (bigger than ~20mb) that includes a Query Filter then you will hit performance limits.
Try running a query that includes LIMIT 1000 at the end. If you still have performance issues then we have something to investigate. But if you are trying to load 100k rows at once then we fully would expect this to cripple your instance.
It’s hard for someone else to help you out without providing redash with a way to install it. In my experience, when disk space is full, there are two situations where the database log file is too large and the query_results table for redash’s PG database is too large. Let me guess the probability of the second one is higher. In this case, it is usually necessary to enlarge the partition where the pg database’s disk files are located. Because the redash database’s table design has a foreign key association, you cannot directly delete the query_results table’s data, which may result in a deletion if you force a deletion
That’s right
this is what Amazon suggested me to do
so I have extended the hard disk partition space and also the memory
but It was an issue in a production and we couldn’t access the server for some time
till knowing the problem
REDASH_CELERY_TASK_RESULT_EXPIRES is described as “How many seconds to keep Celery task results in cache (in seconds)” but note that from Redash 9 Celery has been replaced with RQ so there will probably be a different way to manage this in that version.
REDASH_QUERY_RESULTS_CLEANUP_ENABLED
REDASH_QUERY_RESULTS_CLEANUP_COUNT
REDASH_QUERY_RESULTS_CLEANUP_MAX_AGE
I think these are fairly self explanatory, the first one enables cleanup of query results, COUNT sets the max number of results to retain, MAX_AGE sets the retention period before they are aged out - the default setting is 7 so this is probably 7 days.
For that level of detail I’d suggest having a scan of the Redash source code, which is publicly available on Github.
For example searching for REDASH_QUERY_RESULTS_CLEANUP_MAX_AGE leads me to this page, which provides a bit more info about how these values are used in the cleanup_query_results function.
2 Likes
Login or sign up disabled while the site is in read only mode