I’ve been playing with the idea of having a custom SQL syntax for creating composed queries based on the results of other queries. I’m calling it ReSQL.
The syntax is based on sqlite’s SQL flavour but adds some distinctive productions so it’s obvious at a glance what is standard SQL and what’s not. I imagine most power-users of Re:Dash are familiar with SQL already.
provider "[" params "]" ( "::" mapper )*
Examples
-- Fetch the number of rows reported by another query
SELECT COUNT(*) FROM query[3]
-- Filter a parametrized query resultset
SELECT * FROM query[id:3 p_country:'CAT']
-- Join two queries and aggregate
SELECT countries.name, COUNT(*)
FROM query[3] AS top_users
JOIN query[4] AS countries ON top_users.country = countries.id
GROUP BY countries.name
Additionally the syntax allows to perform adhoc queries using the configured datasources:
-- Fetch a remote CSV and apply a mapping function to parse it
SELECT * FROM http['https://foobar.com/report.csv']::csv
-- Custom HTTP requests with a JSONPath mapping
WITH authors AS http[
url: 'http://foobar.com/report'
AUTHORIZATION: 'token ${variable}' -- variable interpolation
body: [= -- initiate heredoc
{
"store": "Amazon",
"report": "books"
}
=]
]::json[' $library.book[*].author ']
SELECT *
FROM authors
WHERE language = 'en'
-- Use a configured data source for an adhoc query
WITH users AS datasource['Staging MySQL', [=
SELECT * FROM mysql_table -- would run on MySQL
=]
SELECT COUNT(*) FROM users
-- Free text search modifier
SELECT title, year
FROM http['https://foobar.com/movies.csv']::csv::fts[tokenizer: 'porter']
WHERE title MATCH 'lotr OR lord rings'
I tried to keep the examples short so they don’t properly show the whole potential that the design can achieve via composition.
How would it work?
A new data source (query runner) for ReQL would be available, so it builds on the existing infrastructure. When the server receives the query execution from the browser:
- Parses the query to verify the syntax
- Resolves the reql expression (
query[...]
) and checks they exist and the user is allowed to use them - Triggers the execution of each reql expression
- Creates a temporary sqlite3 database
- Inserts the results into separate tables in the temporary DB
- Adapts the query converting the reql expressions into references to the created tables
- Executes the query against the sqlite database
- Reports back the result of the sqlite query
I considered also using the sqlite3 native virtual tables feature but it feels like a pain to troubleshoot and the sqlite3 parser is really lacking regarding syntax errors so I wanted to have a more user friendly parser on top.
Using the main postgresql database for it was also discarded because it would be a pain to isolate properly. With sqlite3 there is total isolation and also allows to define memory usage limits easily.
Another benefit of using sqlite3 with a custom parser is that we could offload some of the work to the browser (if it still supports WebSQL) while the user is editing the query.
Note that using sqlite3 is an implementation detail, it allows us to skip a large chunk of the implementation effort by using it for executing the final query. However I can imagine that in the future ReQL could be extended and have a fully native execution engine that avoids the constant marshalling of the data.
Caveats
- First of all it’s not a simple project to implement although it seems certainly feasible.
- Users could easily end up abusing the system by relying on ReSQL for everything instead of properly preparing the data elsewhere.
- Schema browser support would be flaky
Closing words
Overall I think if offers a lot of potential, having a simple declarative way to massage the data would allow less tech inclined users to benefit from the system. If fits a nice middle ground between parametrized queries and a Python/docker data source.