In Redash version v8 Beta, the databricks delta data source schemas were cached and were easily accessible each time any user logs in. But, we upgraded to the latest Redash version 10.1. After upgrade, the schemas for the datasources are not cached and each time a user logs in it triggers a get_schema queue job to fetch the latest schema, this results in a few minutes waiting 4-5 min as there are a lot of tables we have in Databricks delta
Is this behavior intended , was a change made in Redash to stop caching some data?
Technical details:
Redash Version: 10.1
Browser/OS: Chrome
How did you install Redash: Self Hosted , using official docker images on AWS ECS
For background, please read the documentation for the Databricks query runner. The behaviour of the schema browser is specifically discussed.
This isn’t strictly correct. The schema is cached for one hour, which is the same for all data sources in Redash.
It’s not just when the user logs-in. If you switch to a different endpoint or database, a fresh schema is fetched as well. Given the scale of Databricks endpoints it isn’t feasible to fetch the entire schema across the whole endpoint. So Redash only fetches schemas when a user clicks to access them.
Yes. The previous approach (with a non-custom schema browser component) severely limited what could appear in the schema browser because an administrator would need to configure exactly which schemas were fetched. Which made it inflexible for users creating new schemas / tables.
The custom schema browser component allows you to navigate the schema browser ad-hoc. But creates the case where Redash doesn’t know which schemas to fetch in advance. The work around is to automate a network request to Redash which will kick-off a schema refresh job periodically to keep certain schemas fresh. You can do this with curl and a routine cron job.
While this isn’t as easy as it was in V8, it’s a great deal more flexible. Long-term it would be nice to have this built-in to Redash as a custom task definition.