What is the preferred way to handle very large query results?

cemremengu · February 11, 2020, 10:15pm

In our app we have very large datasets since the user wants to see the chart for the last 3 months (IoT device measurements). However, query errors after like 15 seconds (postgres says unexpected eof, connection terminated etc.).

What is the preferred way for such results and what are the limits?

Another question I am curious is; why did you prefer to store query results as json text instead of individual records in collections so that they can be iterated over? This would also enable to iterate over the cursor while creating the query results so you wont need to fetch everything in memory.

jesse · February 13, 2020, 5:01am

The idea with Redash is to aggregate with the database. To chart IoT data going back several months you can group by day, week, or month and Redash will fly. If you want to smooth the line further then you can use window functions and running totals.

It’s really not meant for lazy loading or pre-aggregation. There are other tools purpose built for such high performance work.

There is an open issue on Github to improve handling for large results. And your proposed solution is among the options. Until that’s implemented however the best advice is to let your database do the heavy lifting.

jesse · February 13, 2020, 5:11am

Sorry to double post: just realized I missed your other question about limits.

There are two kinds:

Size of the query result exceeds query runner’s memory limit, so the task runner kills the process (what you’re seeing right now)
Number of results is too large for your browser to render. So the tab crashes or performance degrades considerably.

In general the query runner can handle data in excess of 100mb. This might happen if you have QRDS queries that merge large data sets. But the browser will really struggle if the query result exceeds 20mb.

If you’re business case absolutely requires charting all the data points then Redash probably isn’t right for you. But that’s a rare requirement in our experience.

▼Categories

▼Tags

What is the preferred way to handle very large query results?