Thoughts on adding support for queries with parameters in query results data source


#1

Following @ariarijp pull request on changing the way we find query IDs in Query Results queries to later support queries with parameters, I wanted to start a discussion on how this might work and to share some thoughts I had:

There are two use cases for querying from queries with parameters:

1.

The simple use case: you just want to use a query with parameters and you “know” what the values are:

  • We can detect that the query you’re querying from needs parameters and show the needed parameter input fields and pass those values along when loading this query (will require adding the logic of detecting query IDs in client side code).
  • Allow for some syntax of passing parameters like @ariarijp suggested (although I feel that using JSON is not very user friendly, although very pragmatic).

2.

There is another more advanced use case: you want to pass returning values from one query into another. Here’s an example:

Datasource A has users list, datasource B has user events. You want to return list of active users in the past day along with their details. So you need to filter the users list in A based on ids from B.

So the query A on data source A might be:

SELECT name FROM users WHERE id IN ({{user_ids}});

The query B on B might be:

SELECT distinct user_id FROM events WHERE created_at > '...';

And you want the results from query B to be merged into the the query A.

Maybe this doesn’t have to be implemented as part of query results feature, though? Maybe it can be some syntax like parameters:

SELECT name FROM users WHERE id IN ({% query_results 456 %});

query_results is a function to load query results and we can have others to insert snippets and maybe other things?


I hope the explanation makes sense. Of course we can implement use case 1 without implementing use case 2.

Any feedback or thoughts will be welcomed :slight_smile:


ReQL query language
#2

Hi, Arik

Thank you for sharing your thoughts.
I can make sense about these use cases.

For now, I would like to talk about your “use case 1”.

First, here is PoC of my thoughts.

For example,

Query 1

SELECT * FROM film WHERE rating = '{{ rating }}'

Query 2

SELECT * FROM query_1('{"rating": "PG-13"}')

I prefer to use JSON as parameter.
The reason why JSON is very common format for developers and easy to understand.

I agree JSON is not friendly for some users.
However, I think SQL is very pragmatic and not friendly for non-technical users such as marketer, biz-dev and so on.

I think Query Results is advanced feature on Redash.
I guess this feature wanted by techinical guys, they don’t require any code completion.
(Of course, I want to use this feature with code completion :grinning:)

Please feel free to ask me about it and any feedback is welcome.


#3

Hi there, and thank you for such a great tool!
I think that I’m trying to do what is described here, in the simple use case (1). I have the “Active Users” query from GA with global parameters, like this one:

{
  "ids": "ga:xxxxxxxxx",
  "start_date": "{{Start Date}}",
  "end_date": "{{End Date}}",
  "metrics": "ga:users"
}

And I also have the “Total Users” number from our PostgreSQL database. Then I want to have the dashboard widget that displays Total / Active users numbers somehow like that:

select total.users || ' / ' || active.ga_users as 'Total / Active'
from query_9 as active
join query_7 as total on active.ga_dimension1 = total.role
order by Role;

This was working ok when parameters were hardcoded but it fails now. Is it correct that for now, it’s not possible to get the results like this with global parameters? If so, what a workaround could be?


#4

Hi @ariarijp, I tried your pull request with the current master (5.0.0-beta) but with no success, unfortunately – my existing queries became broken, e.g. select * from query_7 as users where users.role = 'staff' complains about missing users.role column. It works ok in master, however.


#5

@arikfr, I tried @ariarijp solution with the current master but got Error running query: 'query_9' is not a function. Was something changed since March that could break the patch? I did git log -- redash/query_runner but can’t see anything suspicious. What else could I do to troubleshoot the issue?


#6

Hi @leshik ,

Thank you for your feedback!
I will check my patch on 5.0.0 branch.


#7

@leshik I’ve checked that PoC on master branch commit 47068a6d911716cc2802542cdc5613e14602d561.

It’s looks like works for me.

Here is my queries,

Query 1 (It uses Redash metadata as a data source)

SELECT *
FROM EVENTS
WHERE action = '{{ action }}'
  AND object_type = '{{ object_type }}';

Query 2

SELECT *
FROM query_1('{"action":"view", "object_type":"query"}');

#8

@ariarijp do you have a patch against the current master? maybe I messed things up when trying to apply your patch.


#9

@leshik Sure, here you are.

diff --git a/redash/query_runner/query_results.py b/redash/query_runner/query_results.py
index fc0e09e4..db8fcf08 100644
--- a/redash/query_runner/query_results.py
+++ b/redash/query_runner/query_results.py
@@ -1,9 +1,12 @@
+import hashlib
 import json
 import logging
 import numbers
 import re
 import sqlite3
 
+import pystache
+import sqlparse
 from dateutil import parser
 
 from redash import models
@@ -42,9 +45,23 @@ def _guess_type(value):
     return TYPE_STRING
 
 
-def extract_query_ids(query):
-    queries = re.findall(r'(?:join|from)\s+query_(\d+)', query, re.IGNORECASE)
-    return [int(q) for q in queries]
+def extract_child_queries(query):
+    pattern = re.compile(r"^query_(\d+)(?:\('(\{.+\})'\))?", re.IGNORECASE)
+    queries = []
+
+    for token in sqlparse.parse(query)[0].tokens:
+        m = pattern.match(token.value)
+        if not m:
+            continue
+
+        queries.append({
+            'query_id': int(m.group(1)),
+            'params': {} if m.group(2) is None else json.loads(m.group(2)),
+            'table': 'tmp_' + hashlib.md5(token.value).hexdigest(),
+            'token': token.tokens[0].value if isinstance(token.tokens[0], sqlparse.sql.Function) else token.value,
+        })
+
+    return queries
 
 
 def _load_query(user, query_id):
@@ -54,26 +71,34 @@ def _load_query(user, query_id):
         raise PermissionError("Query id {} not found.".format(query.id))
 
     if not has_access(query.data_source.groups, user, not_view_only):
-        raise PermissionError(u"You are not allowed to execute queries on {} data source (used for query id {}).".format(
-            query.data_source.name, query.id))
+        raise PermissionError(
+            u"You are not allowed to execute queries on {} data source (used for query id {}).".format(
+                query.data_source.name, query.id))
 
     return query
 
 
-def create_tables_from_query_ids(user, connection, query_ids):
-    for query_id in set(query_ids):
-        query = _load_query(user, query_id)
+def create_tables_from_child_queries(user, connection, query, child_queries):
+    for child_query in child_queries:
+        _query = _load_query(user, child_query['query_id'])
 
-        results, error = query.data_source.query_runner.run_query(
-            query.query_text, user)
+        params = child_query['params']
+        if not params:
+            params = _get_default_params(_query)
+
+        query_with_params = pystache.render(_query.query_text, params)
+        results, error = _query.data_source.query_runner.run_query(query_with_params, user)
 
         if error:
             raise Exception(
-                "Failed loading results for query id {}.".format(query.id))
+                "Failed loading results for query id {}.".format(_query.id))
 
         results = json.loads(results)
-        table_name = 'query_{query_id}'.format(query_id=query_id)
+        table_name = child_query['table']
         create_table(connection, table_name, results)
+        query = query.replace(child_query['token'], table_name)
+
+    return query
 
 
 def fix_column_name(name):
@@ -101,6 +126,16 @@ def create_table(connection, table_name, query_results):
         connection.execute(insert_template, values)
 
 
+def _get_default_params(query):
+    params = {}
+
+    if 'parameters' in query.options:
+        for p in query.options['parameters']:
+            params[p['name']] = p['value']
+
+    return params
+
+
 class Results(BaseQueryRunner):
     noop_query = 'SELECT 1'
 
@@ -118,13 +153,13 @@ class Results(BaseQueryRunner):
 
     @classmethod
     def name(cls):
-        return "Query Results (Beta)"
+        return "Query Results Ex(Beta)"
 
     def run_query(self, query, user):
         connection = sqlite3.connect(':memory:')
 
-        query_ids = extract_query_ids(query)
-        create_tables_from_query_ids(user, connection, query_ids)
+        child_queries = extract_child_queries(query)
+        query = create_tables_from_child_queries(user, connection, query, child_queries)
 
         cursor = connection.cursor()
 

Good luck!


#10

@ariarijp Sorry, but I can’t make it work even using your queries. I’m able to execute the first query with parameters:

But the second one fails:

What might be the reason?


#11

More logs:

[2018-09-18 03:13:14,069][PID:73][INFO][Worker-21] task_name=redash.tasks.execute_query taks_id=a6c55223-e6d2-4d4c-ab64-8dd6c8176cb0 task=execute_query query_hash=a5010a00aaf69c88c09e36b60d1212c9 data_length=None error=[function query_19(unknown) does not exist
LINE 2: FROM query_19('{"action":"view", "object_type":"query"}');
             ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
]
[2018-09-18 03:13:14,074][PID:1][ERROR][MainProcess] Task redash.tasks.execute_query[a6c55223-e6d2-4d4c-ab64-8dd6c8176cb0] raised unexpected: QueryExecutionError('function query_19(unknown) does not exist\nLINE 2: FROM query_19(\'{"action":"view", "object_type":"query"}\');\n             ^\nHINT:  No function matches the given name and argument types. You might need to add explicit type casts.\n',)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/app/redash/worker.py", line 71, in __call__
    return TaskBase.__call__(self, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/app/redash/tasks/queries.py", line 528, in execute_query
    scheduled_query).run()
  File "/app/redash/tasks/queries.py", line 470, in run
    raise result
QueryExecutionError: function query_19(unknown) does not exist
LINE 2: FROM query_19('{"action":"view", "object_type":"query"}');
             ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

#12

@leshik Hmm, which data source type are you using?

HINT:  No function matches the given name and argument types.

It looks like an error from PostgreSQL.


#13

@ariarijp As you suggested, I’m using re:dash DB as data source. So I just added its postgres DB to my data sources.


#14

@leshik I see, Can you run your second query on Query Results Data Source?


#15

@ariarijp omg, I’m so sorry for wasting your time, I just forgot to change the data source in the query editor. It works now with your provided example as well as with my case with GA. I will experiment more and will let you know if I hit some edge case. Thank you!


#16

@leshik Great! Happy Redash-ing! :redash:


#17

@ariarijp I figured out what was breaking my queries. After the PoC patch, it’s not possible to use queries like this one anymore:

select total.school
from query_7 as total;

This works on current master, but stops after applying the patch. Before the patch, this query succeds, but after it fails with Error running query: no such column: total.school.

My queries rely on joins that depend on tables aliases so they stopped working.


#18

@leshik Thank you for your feedback!

I understand what is annoying you.

For now, It is one of the limitations of this PoC.

Of course, I always welcome any patch for solving it :slight_smile:


#19

Is it possible to do things like that?
FROM query_254('{"mindate":'{{ mindate }}', "maxdate":'{{ maxdate }}'}');
This doesn’t work because of the brackets. Am I doing this wrong or are parameters into parameters not supported?