Issue Summary
A summary of the issue and the browser/OS environment in which it occurs.
For some queries we have to click “execute” button multiple times.
Sometimes alerts are not getting triggered.
No errors I can find in any of container logs . We are open to pay for consulting engagement
Technical details:
Redash Version: V10
Browser/OS: Chrome/ MAC
How did you install Redash: AMI then upgrade, added Oracle support
Here our docker-compose.yml:
version: "2"
x-redash-service: &redash-service
image: redash_oracle_v10:latest
env_file: /opt/redash/env
restart: always
services:
server:
<<: *redash-service
command: server
ports:
- "5000:5000"
environment:
REDASH_WEB_WORKERS: 4
scheduler:
<<: *redash-service
command: scheduler
scheduled_worker:
<<: *redash-service
command: worker
environment:
QUEUES: "scheduled_queries,schemas"
WORKERS_COUNT: 1
adhoc_worker:
<<: *redash-service
command: worker
environment:
QUEUES: "queries,schemas"
WORKERS_COUNT: 4
worker:
<<: *redash-service
command: worker
environment:
QUEUES: "periodic,emails,default"
WORKERS_COUNT: 1
nginx:
image: redash/nginx:latest
ports:
- "80:80"
depends_on:
- server
links:
- server:redash
restart: always
jesse
November 23, 2021, 2:44pm
2
Where did you find the redash_oracle_v10:latest
image you’re using? This is not one of the official ones.
It was created while adding Oracle support, here is Dockerfile:
FROM redash/redash:10.0.0.b50363
USER root
# Oracle instantclient
ADD oracle/instantclient-basic-linux.x64-11.2.0.4.0.zip /tmp/instantclient-basic-linux.x64-11.2.0.4.0.zip
ADD oracle/instantclient-sdk-linux.x64-11.2.0.4.0.zip /tmp/instantclient-sdk-linux.x64-11.2.0.4.0.zip
ADD oracle/instantclient-sqlplus-linux.x64-11.2.0.4.0.zip /tmp/instantclient-sqlplus-linux.x64-11.2.0.4.0.zip
RUN apt-get update -y
RUN apt-get install -y unzip
RUN unzip /tmp/instantclient-basic-linux.x64-11.2.0.4.0.zip -d /usr/local/
RUN unzip /tmp/instantclient-sdk-linux.x64-11.2.0.4.0.zip -d /usr/local/
RUN unzip /tmp/instantclient-sqlplus-linux.x64-11.2.0.4.0.zip -d /usr/local/
RUN ln -s /usr/local/instantclient_11_2 /usr/local/instantclient
RUN ln -s /usr/local/instantclient/libclntsh.so.11.1 /usr/local/instantclient/libclntsh.so
RUN ln -s /usr/local/instantclient/sqlplus /usr/bin/sqlplus
RUN apt-get install libaio-dev -y
RUN apt-get clean -y
ENV ORACLE_HOME=/usr/local/instantclient
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/instantclient
RUN pip install cx_Oracle==5.2.1
USER redash
#Add REDASH ENV to add Oracle Query Runner
ENV REDASH_ADDITIONAL_QUERY_RUNNERS=redash.query_runner.oracle
jesse
November 23, 2021, 4:47pm
4
Good thanks.
You have not provided enough information to debug, yet.
Please examine your browser’s network inspector when you click the Execute button. Are there multiple network requests? Or does the first press do nothing?
It’s not clear if the issue is with your front-end or the API.
I do see multiple network requests, sometime a lot of them , when query return results, sometime just 3
jesse
November 24, 2021, 5:29am
6
good. now focus on the network requests when the problem reproduces . You said the problem is you click Execute and nothing happens.
When “nothing happens”, what do you see in these network requests? Particularly the call to the /results endpoint.
Nir
November 24, 2021, 8:12am
7
nothing happens refers to:
not getting results back
at the bottom (where the result count) we see “0 rows - runtime”
we see 3 requests when that happens:
results:
{“job”: {“id”: “2bee02b3-3a2a-4138-bfc0-f7b39ff324ae”, “updated_at”: 0, “status”: 1, “error”: “”, “result”: null, “query_result_id”: null}}
api/jobs/2bee02b3-3a2a-4138-bfc0-f7b39ff324ae:
{“job”: {“id”: “2bee02b3-3a2a-4138-bfc0-f7b39ff324ae”, “updated_at”: 0, “status”: 4, “error”: “”, “result”: null, “query_result_id”: null}}
3… events:
null
jesse
November 24, 2021, 7:14pm
8
How many rows does this query return if you run it on a separate tool? It looks like the worker is probably dying because it runs out of memory.
16, I can’t find any errors in the logs
jesse
November 24, 2021, 7:51pm
10
There wouldn’t be errors in the logs for this. If the worker dies it just dies.
How much RAM on your instance?
Also, is this a query results data source query?
15 GB available on the host,
Yes, query done through data source
jesse
November 24, 2021, 9:24pm
12
Available to Docker? Many system limit docker’s ram
About the data source: I’m asking specifically what kind of data source the query uses.
Nir
November 24, 2021, 10:03pm
13
The null/empty result comes from
MySQL, oracle, and redshift. I haven’t checked others.
As to the ram limitation, how can we check?
This installation is based of your IAM image on a bigger instance type.
ATM we set 6 web servers, 20 workers, 16 adhoc. Free memory is 6g.
jesse
November 24, 2021, 10:19pm
14
Why do you have six web servers? Are you running k8s by chance? That might explain the weird performance…
We do have 6 redash_web_workers, not web servers, we are running on EC2, not k8s
Here id docker container stats:
jesse
November 25, 2021, 6:42pm
16
Cool thanks. The next step is to check the logs for your adhoc_worker
. In the network response you pasted above, the job returns a code 4 which means query execution failed. The logs for adhoc worker will show more about what went wrong.
You can do this by SSH’ing into the server:
$ sudo docker-compose logs -f --tail 100 adhoc_worker
Then execute the query in your browser and watch the logs for a stack trace.
Nir
November 25, 2021, 7:02pm
17
There isn’t any stacktrace.
also looking for the failed run didn’t show anything
root@ip-x-x-x-x:/opt/redash# docker logs --tail 1000 redash_adhoc_worker_1 &> log.log
root@ip-x-x-x-x:/opt/redash# grep “8b579138-cef8-456d-9e81-ab9ab4a5dc7b” log.log
Nir
November 25, 2021, 7:04pm
18
wait, i miss spoke
adhoc_worker_1 | [2021-11-25 19:03:33,420][PID:2416][WARNING][rq.job.redash.tasks.queries.execution] job.func_name=redash.tasks.queries.execution.execute_query job.id =edb92cb8-6ab2-4d40-8180-e9396dc04584 Unexpected error while running query:
adhoc_worker_1 | Traceback (most recent call last):
adhoc_worker_1 | File “/app/redash/tasks/queries/execution.py”, line 188, in run
adhoc_worker_1 | data, error = query_runner.run_query(annotated_query, self.user)
adhoc_worker_1 | File “/app/redash/query_runner/mysql.py”, line 168, in run_query
adhoc_worker_1 | while not ev.wait(1):
adhoc_worker_1 | File “/usr/local/lib/python3.7/threading.py”, line 552, in wait
adhoc_worker_1 | signaled = self._cond.wait(timeout)
adhoc_worker_1 | File “/usr/local/lib/python3.7/threading.py”, line 300, in wait
adhoc_worker_1 | gotit = waiter.acquire(True, timeout)
adhoc_worker_1 | File “/app/redash/tasks/queries/execution.py”, line 129, in signal_handler
adhoc_worker_1 | raise InterruptException
adhoc_worker_1 | redash.query_runner.InterruptException
Nir
November 25, 2021, 7:26pm
19
interesting thing is that I tried it a few more times and did not get stack trace for those failures.
Thanks,
Nir Ben Ezri
DBA Team Lead
DistroKid
https://distrokid.com
Also on some queries, I’ve noticed following warning, but it happens only sometimes
cursor.execute(query)
/app/redash/query_runner/mysql.py:181: Warning: (1292, “Truncated incorrect DOUBLE value: ‘646030 13627’”)
cursor.execute(query)
/app/redash/query_runner/mysql.py:181: Warning: (1292, “Truncated incorrect DOUBLE value: ‘646030 13798’”)
cursor.execute(query)
/app/redash/query_runner/mysql.py:181: Warning: (1292, “Truncated incorrect DOUBLE value: ‘646030 14970’”)
cursor.execute(query)
/app/redash/query_runner/mysql.py:181: Warning: (1292, “Truncated incorrect DOUBLE value: ‘646030 15302’”)
cursor.execute(query)