It takes a very long time to display the query results

calicocookie · June 21, 2022, 6:26am

It takes a very long time (20 minutes or more) to display the query results, even though the calculation itself is about only 200MB.
I guess the calculation itself is not heavy, but it may take time to display it because the output size is close to 1 milion records.

-Is there an upper limit to the output that can be displayed?
-Is there anything I can do to improve?

jesse · June 21, 2022, 4:07pm

That’s the problem. 200mb isn’t much for a computer, but it’s a monster for a front-end web application to plot to SVG.

Aggregate your results in the query. Realistically, if you aggregated your data by a factor of 10 the plot would look identical anyway.

Redash is built to visualise aggregated data sets around ~20mb in size (most query results are dramatically smaller than that). There’s no hard upper-limit. It just depends how beefy your computer / browser are. For example if you run on an M1 Macbook (which has the fastest javascript execution engine on the planet) you’ll have better luck than on an Intel i3 from 2012 XD

calicocookie · June 22, 2022, 1:07am

Thank you very much!
I finally understood where the problem was…

Do you mean I should only do aggregate and show aggregated result on web browser ?
But sometimes I have to get result as table (more than 1 milion records) , and call the table as QueryResults from other query.

ex: from aggregate queryC calls table queryA&queryB.
Then, the queriesA&B on the called side aren’t updated automatically, when I run queryC, they don’t run at the sametime. I cant get up to date result.
The problem is I have to run queriesA&B at first to get queryC’s aggregated result. (This process include showing queriesA&B’s big table result on web browser…)

Where should I devise or improve in these process?
Thank you.

jesse · June 22, 2022, 5:49pm

I see. What you’re describing is an ETL situation. You can use QRDS for this within Redash. But if you have that much data you would be better served with a proper ETL solution, and then pointing Redash at that.

The Query Results Data Source is not intended for joining millions of records from different sources. And now you can see why: it doesn’t perform well in the front-end.

calicocookie · June 23, 2022, 8:15am

Thank you very much! I understood.

You mean we have to go through ETL between Bigquery and Redash.
Are there any commoly used ETL or recommended ELT?

Thank you.

jesse · June 23, 2022, 3:31pm

Wait, is all the data already in BigQuery? If so, why not aggregate it in BigQuery rather than with QRDS?

Or is some of the data housed elsewhere?

calicocookie · June 27, 2022, 6:08am

Thank you for your pointing out.

I may understand what is the correct process little by little…
We should make new dataset dedicated to aggregation, and make aggregation tables in this dataset, then, connect this new dataset to Redash. Right?

We don’t have any data other than Bigquery.
I thought it was necessary to use ETL like Dataflow, but apparently it’s not necessary.

jesse · June 27, 2022, 6:10am

Exactly. You just want to use the right tool for the job. Redash can combine small data sets. But if all the data is already in BQ then it makes most sense to leverage that power.

calicocookie · June 27, 2022, 6:15am

Thanks a lot
I will try!

▼Categories

▼Tags

It takes a very long time to display the query results