Issue Summary

A summary of the issue and the browser/OS environment in which it occurs.

  1. For some queries we have to click “execute” button multiple times.
  2. Sometimes alerts are not getting triggered.

No errors I can find in any of container logs . We are open to pay for consulting engagement

Technical details:

  • Redash Version: V10
  • Browser/OS: Chrome/ MAC
  • How did you install Redash: AMI then upgrade, added Oracle support
    Here our docker-compose.yml:
version: "2"
x-redash-service: &redash-service
  image: redash_oracle_v10:latest
  env_file: /opt/redash/env
  restart: always
services:
  server:
    <<: *redash-service
    command: server
    ports:
      - "5000:5000"
    environment:
      REDASH_WEB_WORKERS: 4
  scheduler:
    <<: *redash-service
    command: scheduler
  scheduled_worker:
    <<: *redash-service
    command: worker
    environment:
      QUEUES: "scheduled_queries,schemas"
      WORKERS_COUNT: 1
  adhoc_worker:
    <<: *redash-service
    command: worker
    environment:
      QUEUES: "queries,schemas"
      WORKERS_COUNT: 4
  worker:
   <<: *redash-service
   command: worker
   environment:
    QUEUES: "periodic,emails,default"
    WORKERS_COUNT: 1
  nginx:
    image: redash/nginx:latest
    ports:
      - "80:80"
    depends_on:
      - server
    links:
      - server:redash
    restart: always

Where did you find the redash_oracle_v10:latest image you’re using? This is not one of the official ones.

It was created while adding Oracle support, here is Dockerfile:


FROM redash/redash:10.0.0.b50363

USER root

# Oracle instantclient
ADD oracle/instantclient-basic-linux.x64-11.2.0.4.0.zip /tmp/instantclient-basic-linux.x64-11.2.0.4.0.zip
ADD oracle/instantclient-sdk-linux.x64-11.2.0.4.0.zip /tmp/instantclient-sdk-linux.x64-11.2.0.4.0.zip
ADD oracle/instantclient-sqlplus-linux.x64-11.2.0.4.0.zip /tmp/instantclient-sqlplus-linux.x64-11.2.0.4.0.zip

RUN apt-get update  -y
RUN apt-get install -y unzip

RUN unzip /tmp/instantclient-basic-linux.x64-11.2.0.4.0.zip -d /usr/local/
RUN unzip /tmp/instantclient-sdk-linux.x64-11.2.0.4.0.zip -d /usr/local/
RUN unzip /tmp/instantclient-sqlplus-linux.x64-11.2.0.4.0.zip -d /usr/local/
RUN ln -s /usr/local/instantclient_11_2 /usr/local/instantclient
RUN ln -s /usr/local/instantclient/libclntsh.so.11.1 /usr/local/instantclient/libclntsh.so
RUN ln -s /usr/local/instantclient/sqlplus /usr/bin/sqlplus

RUN apt-get install libaio-dev -y
RUN apt-get clean -y

ENV ORACLE_HOME=/usr/local/instantclient
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/instantclient

RUN pip install cx_Oracle==5.2.1

USER redash
#Add REDASH ENV to add Oracle Query Runner 
ENV REDASH_ADDITIONAL_QUERY_RUNNERS=redash.query_runner.oracle

Good thanks.

You have not provided enough information to debug, yet.

Please examine your browser’s network inspector when you click the Execute button. Are there multiple network requests? Or does the first press do nothing?

It’s not clear if the issue is with your front-end or the API.

I do see multiple network requests, sometime a lot of them , when query return results, sometime just 3

good. now focus on the network requests when the problem reproduces. You said the problem is you click Execute and nothing happens.

When “nothing happens”, what do you see in these network requests? Particularly the call to the /results endpoint.

nothing happens refers to:

  1. not getting results back
  2. at the bottom (where the result count) we see “0 rows - runtime”

we see 3 requests when that happens:

  1. results:
    {“job”: {“id”: “2bee02b3-3a2a-4138-bfc0-f7b39ff324ae”, “updated_at”: 0, “status”: 1, “error”: “”, “result”: null, “query_result_id”: null}}
  2. api/jobs/2bee02b3-3a2a-4138-bfc0-f7b39ff324ae:
    {“job”: {“id”: “2bee02b3-3a2a-4138-bfc0-f7b39ff324ae”, “updated_at”: 0, “status”: 4, “error”: “”, “result”: null, “query_result_id”: null}}
    3… events:
    null

How many rows does this query return if you run it on a separate tool? It looks like the worker is probably dying because it runs out of memory.

16, I can’t find any errors in the logs

There wouldn’t be errors in the logs for this. If the worker dies it just dies.

How much RAM on your instance?

Also, is this a query results data source query?

15 GB available on the host,
Yes, query done through data source

Available to Docker? Many system limit docker’s ram

About the data source: I’m asking specifically what kind of data source the query uses.

The null/empty result comes from
MySQL, oracle, and redshift. I haven’t checked others.

As to the ram limitation, how can we check?

This installation is based of your IAM image on a bigger instance type.

ATM we set 6 web servers, 20 workers, 16 adhoc. Free memory is 6g.

Why do you have six web servers? Are you running k8s by chance? That might explain the weird performance…

We do have 6 redash_web_workers, not web servers, we are running on EC2, not k8s

Here id docker container stats:

Cool thanks. The next step is to check the logs for your adhoc_worker. In the network response you pasted above, the job returns a code 4 which means query execution failed. The logs for adhoc worker will show more about what went wrong.

You can do this by SSH’ing into the server:

$ sudo docker-compose logs -f --tail 100 adhoc_worker

Then execute the query in your browser and watch the logs for a stack trace.

There isn’t any stacktrace.

also looking for the failed run didn’t show anything

root@ip-x-x-x-x:/opt/redash# docker logs --tail 1000 redash_adhoc_worker_1 &> log.log
root@ip-x-x-x-x:/opt/redash# grep “8b579138-cef8-456d-9e81-ab9ab4a5dc7b” log.log

wait, i miss spoke

adhoc_worker_1 | [2021-11-25 19:03:33,420][PID:2416][WARNING][rq.job.redash.tasks.queries.execution] job.func_name=redash.tasks.queries.execution.execute_query job.id=edb92cb8-6ab2-4d40-8180-e9396dc04584 Unexpected error while running query:
adhoc_worker_1 | Traceback (most recent call last):
adhoc_worker_1 | File “/app/redash/tasks/queries/execution.py”, line 188, in run
adhoc_worker_1 | data, error = query_runner.run_query(annotated_query, self.user)
adhoc_worker_1 | File “/app/redash/query_runner/mysql.py”, line 168, in run_query
adhoc_worker_1 | while not ev.wait(1):
adhoc_worker_1 | File “/usr/local/lib/python3.7/threading.py”, line 552, in wait
adhoc_worker_1 | signaled = self._cond.wait(timeout)
adhoc_worker_1 | File “/usr/local/lib/python3.7/threading.py”, line 300, in wait
adhoc_worker_1 | gotit = waiter.acquire(True, timeout)
adhoc_worker_1 | File “/app/redash/tasks/queries/execution.py”, line 129, in signal_handler
adhoc_worker_1 | raise InterruptException
adhoc_worker_1 | redash.query_runner.InterruptException

interesting thing is that I tried it a few more times and did not get stack trace for those failures.

Thanks,
Nir Ben Ezri
DBA Team Lead
DistroKid
https://distrokid.com

Also on some queries, I’ve noticed following warning, but it happens only sometimes

cursor.execute(query)
/app/redash/query_runner/mysql.py:181: Warning: (1292, “Truncated incorrect DOUBLE value: ‘646030 13627’”)
cursor.execute(query)
/app/redash/query_runner/mysql.py:181: Warning: (1292, “Truncated incorrect DOUBLE value: ‘646030 13798’”)
cursor.execute(query)
/app/redash/query_runner/mysql.py:181: Warning: (1292, “Truncated incorrect DOUBLE value: ‘646030 14970’”)
cursor.execute(query)
/app/redash/query_runner/mysql.py:181: Warning: (1292, “Truncated incorrect DOUBLE value: ‘646030 15302’”)
cursor.execute(query)