Our primary data source is AWS Athena and was faced same issue with you.
There were some users that cause memory usage issue for redash because of fetch many rows (over 100k).
Consequently, some patches like below was applied to self hosted redash.
def __check_limit(self, query):
query = sqlparse.format(query, keyword_case='upper', identifier_case='lower')
parsed = sqlparse.parse(query)
select_flag = False
limit_flag = False
for t in parsed.tokens:
if str(t) == 'SELECT':
select_flag = True
if str(t) == 'LIMIT':
limit_flag = True
if select_flag and limit_flag:
elif not select_flag:
elif select_flag and not limit_flag:
raise Exception("required limit")
I posted same question to AWS Support. They saied that Athena and official libraries provide some feature to fix the problem:
For example, RowsToFetchPerBlock and UseResultsetStreaming in Athena JDBC Driver can control row counts.
Athena’s GetQUeryResults API can also control. (doc)
But redash (and related libraries) don’t use it.
I know the default row count is 1000 when LIMIT is absent in current redash, but some user writes a query with the large LIMIT count.
So I need the LIMIT count enforcing feature.
For Alternative, I try to write small proxy server to intercept LIMIT-less query just now.
Sorry for Athena only reply because we don’t use Redshift.