Schema loader painfully slow with many tables

ryank · August 18, 2016, 4:41pm

We are wondering a good way to disable to schema viewer in the query window.

Our database has hundreds of tables, which causes ~ 5-10 second lag when opening any query URL

Perhaps a way to disable the javascript for that section? Would be fine with a hacky solution for now.

Cheers.

arikfr · August 18, 2016, 7:31pm

I think the easiest way for you will be to disable schema fetching in the server code as it’s easier to tinker with as you can change it “in place”.

There are two places where you can disable schema loading:

The API handler: the code is very obvious. Just change it to return an empty array ([]). If you have multiple data sources in your instance and you want to disable it only for one of them, you can add some logic there to return empty array only for specific requests.
In the query runner: each query runner has a get_schema or _get_tables method (depends on which class it inherits). You can change it to return an empty array. The benefit of doing this at the query runner level is that you won’t needlessly query your database for the schema.

If you don’t mind Redash querying for the schema in the background, then go for option #1.
If you go for option #2, then make sure to make the change in the correct query runner and delete the Redis cache key for the schema (data_source:schema:<data source id>).

We’re planning to improve the handling of large schemas by avoiding loading all the schema in the frontend, but this will be only in a release or two from now.

arikfr · August 19, 2016, 8:43pm

Update: next version is going to have better handling of large schemas: https://github.com/getredash/redash/pull/1246.

I tested it with 1669 tables and 38K columns. Do you know how many you have?

ryank · August 19, 2016, 8:45pm

~3500 tables, probably around that many columns.

arikfr · August 30, 2016, 3:59pm

I think it should be OK even with 3500 tables.

yishikawa · June 8, 2022, 8:57am

I cannot load schema of 3979 tables from BigQuery.
It stopped loading without an error, everytime in 5 minutes from starting.
As another connection with 1000 tables works well, loading in 3 minites,
I dought some timeout setting is within 300 sec, but couldn’t find in document,
I’ve already set env following

REDASH_BIGQUERY_HTTP_TIMEOUT=1800
REDASH_SCHEMAS_REFRESH_SCHEDULE=360

Are there another timeout setting around schema loading?

I’m using Redash V10, in Ubuntu docker enviroment.

jesse · June 8, 2022, 2:16pm

How many tokens are there in this schema? The number of tables isn’t the complete picture since one table could have 3k columns e.g.

yishikawa · June 9, 2022, 12:58am

Hi, @jesse

What do you mean by “token”, and how can I check the amount of tokens?

Yes, we have tables with near 100 columns.
These tables also have _table_suffix of date, at the same time,
and 1000 tables are there with the same schema.

example 1)
Google Analytics 4’ exported table

analytics_*****.events_20220609
analytics_*****.events_20220608
…

example 2)
Google App Engine’s lof
appengine_log.appengine_googleapis_com_request_log_20220609
appengine_log.appengine_googleapis_com_request_log_20220608
…

If I can skip loading these suffix tables,
under 500 tables are in these schemas.

jesse · June 9, 2022, 9:49pm

A table name is a token. A column name is a token. The total number of tokens is the sum of all the table names + the sum of all the column names. If you have 3k tables with 100 columns each that totals 303k tokens. The documentation advises that if your database has more than 5k tokens, you should disable schema loading entirely from the data source settings screen.

yishikawa · June 14, 2022, 7:48am

Let me share my walk around.

create new service account
upload JSON Key File to your data source setting
go to BQ console
a. add “BigQuery Data Viewer” to Dataset Permissions if the schema doesn’t contain “suffix-table”
b. add “BigQuery Data Viewer” to Table Permissions if the schema contain “suffix-table”
go “new query” and reflesh schema, then you see what you want

In case b, you can query table but schema loader doesn’t waste tokens for these tables and columns