I faced a production issue while using redash
it could be that I didn’t know all the information about how redash Is working
also I didn’t find a clear documentation for the tool

  • Why redash is saving the data in Postgres data base if redash is just cashing the data
  • Why can’t I filter big amount of data in redash

Thanks

I am still waiting the answer :frowning:

Can you provide some more information.

Redash stores query result but you can set after much much time data should be cleared.

On filter part, provide more details

Redash caches your latest query result for each query. If you pull large datasets into Redash then its internal database will grow.

Redash is built to plot aggregated data sets. That means your query results should be ~20mb for optimal performance. As your query results grow the performance will deteriorate in two ways:

  1. Your query runners may not have sufficient resources to hold the query result in RAM before it is cached in the internal database.
  2. Your browser can choke if you bring many megabytes of data at once.

It’s not clear why your hard disk is full. Did you research what is taking the space?

1 Like

Yes our resources were not enough to run the queries
and that’s why we increase both the Memory and the hard disk

Is there any clear documentation for how redash is working but in more details ?

it was not clear for me when redash is storing the data itself not the query on the hard disk
as for me the hard disk was locked suddenly because of a huge query

on the filter part in redash documentation it’s mentioned that filters should be used for smaller data sets only

What specifically do you want to know apart from what has already been written?

Seems like there’s a misunderstanding in this thread about what “filtering” means.

  • Filtering in general is any SQL statement preceded by a WHERE clause. These are required when working with large data sets in Redash. You’ll crash the app if you select * from a large data set without any filtering applied.
  • Query Filters are a feature of Redash that require you to alias your columns with ::filter. If you have a query that returns 3k results, a Query Filter will work fine. If you load a large data set (bigger than ~20mb) that includes a Query Filter then you will hit performance limits.

Try running a query that includes LIMIT 1000 at the end. If you still have performance issues then we have something to investigate. But if you are trying to load 100k rows at once then we fully would expect this to cripple your instance.

2 Likes

for example I didn’t know when redash is storing the data in the db
for how long does redash cache the data ?

These Env variables are not clear for me
and I am trying to find a documentation to understand how and when to use it

REDASH_CELERY_TASK_RESULT_EXPIRES
REDASH_QUERY_RESULTS_CLEANUP_ENABLED
REDASH_QUERY_RESULTS_CLEANUP_COUNT
REDASH_QUERY_RESULTS_CLEANUP_MAX_AGE

It’s hard for someone else to help you out without providing redash with a way to install it. In my experience, when disk space is full, there are two situations where the database log file is too large and the query_results table for redash’s PG database is too large. Let me guess the probability of the second one is higher. In this case, it is usually necessary to enlarge the partition where the pg database’s disk files are located. Because the redash database’s table design has a foreign key association, you cannot directly delete the query_results table’s data, which may result in a deletion if you force a deletion

1 Like

waiting your answer

And thank you for your support :slight_smile:

That’s right
this is what Amazon suggested me to do

so I have extended the hard disk partition space and also the memory
but It was an issue in a production and we couldn’t access the server for some time
till knowing the problem

What answer are you waiting on? I don’t show any outstanding questions here :smiley:

Can you see my questions now ?
as I wrote them two times

1- for how long time does redash cache the data ?
And how can I change that

2-
These evironment variables are not clear for me
and I am trying to find a documentation to understand the usage and the definition of them

REDASH_CELERY_TASK_RESULT_EXPIRES
REDASH_QUERY_RESULTS_CLEANUP_ENABLED
REDASH_QUERY_RESULTS_CLEANUP_COUNT
REDASH_QUERY_RESULTS_CLEANUP_MAX_AGE

Have you looked at the documentation e.g. https://redash.io/help/open-source/admin-guide/env-vars-settings
This gives you the default settings for most env vars and some descriptions.

REDASH_CELERY_TASK_RESULT_EXPIRES is described as “How many seconds to keep Celery task results in cache (in seconds)” but note that from Redash 9 Celery has been replaced with RQ so there will probably be a different way to manage this in that version.

REDASH_QUERY_RESULTS_CLEANUP_ENABLED
REDASH_QUERY_RESULTS_CLEANUP_COUNT
REDASH_QUERY_RESULTS_CLEANUP_MAX_AGE
I think these are fairly self explanatory, the first one enables cleanup of query results, COUNT sets the max number of results to retain, MAX_AGE sets the retention period before they are aged out - the default setting is 7 so this is probably 7 days.

Thank you for your comment

yes I used this link to get actually these variables

  • I faced a problem in a production that’s why I asking about the exact measure and meaning for each variable

REDASH_QUERY_RESULTS_CLEANUP_MAX_AGE
is it days or queries ?

REDASH_QUERY_RESULTS_CLEANUP_COUNT
is it results or memory space ?

I want to make sure about the meaning and the measurments of each one
and the default time to keep the data into memory

For that level of detail I’d suggest having a scan of the Redash source code, which is publicly available on Github.

For example searching for REDASH_QUERY_RESULTS_CLEANUP_MAX_AGE leads me to this page, which provides a bit more info about how these values are used in the cleanup_query_results function.

2 Likes