It takes a very long time (20 minutes or more) to display the query results, even though the calculation itself is about only 200MB.
I guess the calculation itself is not heavy, but it may take time to display it because the output size is close to 1 milion records.
-Is there an upper limit to the output that can be displayed?
-Is there anything I can do to improve?
That’s the problem. 200mb isn’t much for a computer, but it’s a monster for a front-end web application to plot to SVG.
Aggregate your results in the query. Realistically, if you aggregated your data by a factor of 10 the plot would look identical anyway.
Thank you very much!
I finally understood where the problem was…
Do you mean I should only do aggregate and show aggregated result on web browser ?
But sometimes I have to get result as table (more than 1 milion records) , and call the table as QueryResults from other query.
ex: from aggregate queryC calls table queryA&queryB.
Then, the queriesA&B on the called side aren’t updated automatically, when I run queryC, they don’t run at the sametime. I cant get up to date result.
The problem is I have to run queriesA&B at first to get queryC’s aggregated result. (This process include showing queriesA&B’s big table result on web browser…)
Where should I devise or improve in these process?
I see. What you’re describing is an ETL situation. You can use QRDS for this within Redash. But if you have that much data you would be better served with a proper ETL solution, and then pointing Redash at that.
The Query Results Data Source is not intended for joining millions of records from different sources. And now you can see why: it doesn’t perform well in the front-end.
Thank you very much! I understood.
You mean we have to go through ETL between Bigquery and Redash.
Are there any commoly used ETL or recommended ELT?
Wait, is all the data already in BigQuery? If so, why not aggregate it in BigQuery rather than with QRDS?
Or is some of the data housed elsewhere?
Thank you for your pointing out.
I may understand what is the correct process little by little…
We should make new dataset dedicated to aggregation, and make aggregation tables in this dataset, then, connect this new dataset to Redash. Right?
We don’t have any data other than Bigquery.
I thought it was necessary to use ETL like Dataflow, but apparently it’s not necessary.
Exactly. You just want to use the right tool for the job. Redash can combine small data sets. But if all the data is already in BQ then it makes most sense to leverage that power.