Enforce to add partition key and limit clause

AWS Athena increases the usage fee depending on the amount of data loaded.
To control this, I’m adding the following patch:
If a table with a partition key set forces you to specify it in the where clause when referring to the table. SQL that violates this will be canceled before being submitted to Athena.

Similarly, I apply a patch to force all SQL to have a LIMIT clause to keep Redash’s memory usage below a certain level.

For example, add a checkbox to enforce the Partition Key in places like the attached image.

Hello [ytannai],
Thanks t alot for your valuable info. I’m looking similar stuff for Redshift data source. We use Redshift as our primary data source and we expose data via redash to users. Quite often, people ignore limit clause and it blows up our memory. Could you please share some info if you have it!

Thanks in advance!

Thanks and Regards,
Kylasam N A

Hi @kylashpriya

Our primary data source is AWS Athena and was faced same issue with you.
There were some users that cause memory usage issue for redash because of fetch many rows (over 100k).
Consequently, some patches like below was applied to self hosted redash.

def __check_limit(self, query):
    query = sqlparse.format(query, keyword_case='upper', identifier_case='lower')
    parsed = sqlparse.parse(query)
    select_flag = False
    limit_flag = False
    for t in parsed[0].tokens:
        if str(t) == 'SELECT':
            select_flag = True
        if str(t) == 'LIMIT':
            limit_flag = True
    if select_flag and limit_flag:
        return
    elif not select_flag:
        return
    elif select_flag and not limit_flag:
        raise Exception("required limit")

I posted same question to AWS Support. They saied that Athena and official libraries provide some feature to fix the problem:
For example, RowsToFetchPerBlock and UseResultsetStreaming in Athena JDBC Driver can control row counts.
Athena’s GetQUeryResults API can also control. (doc)
But redash (and related libraries) don’t use it.

I know the default row count is 1000 when LIMIT is absent in current redash, but some user writes a query with the large LIMIT count.
So I need the LIMIT count enforcing feature.

For Alternative, I try to write small proxy server to intercept LIMIT-less query just now.

Sorry for Athena only reply because we don’t use Redshift.

1 Like