Redash V10 query execution

afridland · November 23, 2021, 5:06am

Issue Summary

A summary of the issue and the browser/OS environment in which it occurs.

For some queries we have to click “execute” button multiple times.
Sometimes alerts are not getting triggered.

No errors I can find in any of container logs . We are open to pay for consulting engagement

Technical details:

Redash Version: V10
Browser/OS: Chrome/ MAC
How did you install Redash: AMI then upgrade, added Oracle support
Here our docker-compose.yml:

version: "2"
x-redash-service: &redash-service
  image: redash_oracle_v10:latest
  env_file: /opt/redash/env
  restart: always
services:
  server:
    <<: *redash-service
    command: server
    ports:
      - "5000:5000"
    environment:
      REDASH_WEB_WORKERS: 4
  scheduler:
    <<: *redash-service
    command: scheduler
  scheduled_worker:
    <<: *redash-service
    command: worker
    environment:
      QUEUES: "scheduled_queries,schemas"
      WORKERS_COUNT: 1
  adhoc_worker:
    <<: *redash-service
    command: worker
    environment:
      QUEUES: "queries,schemas"
      WORKERS_COUNT: 4
  worker:
   <<: *redash-service
   command: worker
   environment:
    QUEUES: "periodic,emails,default"
    WORKERS_COUNT: 1
  nginx:
    image: redash/nginx:latest
    ports:
      - "80:80"
    depends_on:
      - server
    links:
      - server:redash
    restart: always

jesse · November 23, 2021, 2:44pm

Where did you find the redash_oracle_v10:latest image you’re using? This is not one of the official ones.

afridland · November 23, 2021, 3:01pm

It was created while adding Oracle support, here is Dockerfile:

FROM redash/redash:10.0.0.b50363

USER root

# Oracle instantclient
ADD oracle/instantclient-basic-linux.x64-11.2.0.4.0.zip /tmp/instantclient-basic-linux.x64-11.2.0.4.0.zip
ADD oracle/instantclient-sdk-linux.x64-11.2.0.4.0.zip /tmp/instantclient-sdk-linux.x64-11.2.0.4.0.zip
ADD oracle/instantclient-sqlplus-linux.x64-11.2.0.4.0.zip /tmp/instantclient-sqlplus-linux.x64-11.2.0.4.0.zip

RUN apt-get update  -y
RUN apt-get install -y unzip

RUN unzip /tmp/instantclient-basic-linux.x64-11.2.0.4.0.zip -d /usr/local/
RUN unzip /tmp/instantclient-sdk-linux.x64-11.2.0.4.0.zip -d /usr/local/
RUN unzip /tmp/instantclient-sqlplus-linux.x64-11.2.0.4.0.zip -d /usr/local/
RUN ln -s /usr/local/instantclient_11_2 /usr/local/instantclient
RUN ln -s /usr/local/instantclient/libclntsh.so.11.1 /usr/local/instantclient/libclntsh.so
RUN ln -s /usr/local/instantclient/sqlplus /usr/bin/sqlplus

RUN apt-get install libaio-dev -y
RUN apt-get clean -y

ENV ORACLE_HOME=/usr/local/instantclient
ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/instantclient

RUN pip install cx_Oracle==5.2.1

USER redash
#Add REDASH ENV to add Oracle Query Runner 
ENV REDASH_ADDITIONAL_QUERY_RUNNERS=redash.query_runner.oracle

jesse · November 23, 2021, 4:47pm

Good thanks.

You have not provided enough information to debug, yet.

Please examine your browser’s network inspector when you click the Execute button. Are there multiple network requests? Or does the first press do nothing?

It’s not clear if the issue is with your front-end or the API.

afridland · November 24, 2021, 5:14am

I do see multiple network requests, sometime a lot of them , when query return results, sometime just 3

jesse · November 24, 2021, 5:29am

good. now focus on the network requests when the problem reproduces. You said the problem is you click Execute and nothing happens.

When “nothing happens”, what do you see in these network requests? Particularly the call to the /results endpoint.

Nir · November 24, 2021, 8:12am

nothing happens refers to:

not getting results back
at the bottom (where the result count) we see “0 rows - runtime”

we see 3 requests when that happens:

results:
{“job”: {“id”: “2bee02b3-3a2a-4138-bfc0-f7b39ff324ae”, “updated_at”: 0, “status”: 1, “error”: “”, “result”: null, “query_result_id”: null}}
api/jobs/2bee02b3-3a2a-4138-bfc0-f7b39ff324ae:
{“job”: {“id”: “2bee02b3-3a2a-4138-bfc0-f7b39ff324ae”, “updated_at”: 0, “status”: 4, “error”: “”, “result”: null, “query_result_id”: null}}
3… events:
null

jesse · November 24, 2021, 7:14pm

How many rows does this query return if you run it on a separate tool? It looks like the worker is probably dying because it runs out of memory.

afridland · November 24, 2021, 7:44pm

16, I can’t find any errors in the logs

jesse · November 24, 2021, 7:51pm

There wouldn’t be errors in the logs for this. If the worker dies it just dies.

How much RAM on your instance?

Also, is this a query results data source query?

afridland · November 24, 2021, 9:00pm

15 GB available on the host,
Yes, query done through data source

jesse · November 24, 2021, 9:24pm

Available to Docker? Many system limit docker’s ram

About the data source: I’m asking specifically what kind of data source the query uses.

Nir · November 24, 2021, 10:03pm

The null/empty result comes from
MySQL, oracle, and redshift. I haven’t checked others.

As to the ram limitation, how can we check?

This installation is based of your IAM image on a bigger instance type.

ATM we set 6 web servers, 20 workers, 16 adhoc. Free memory is 6g.

jesse · November 24, 2021, 10:19pm

Why do you have six web servers? Are you running k8s by chance? That might explain the weird performance…

afridland · November 25, 2021, 5:13am

We do have 6 redash_web_workers, not web servers, we are running on EC2, not k8s

Here id docker container stats:

jesse · November 25, 2021, 6:42pm

Cool thanks. The next step is to check the logs for your adhoc_worker. In the network response you pasted above, the job returns a code 4 which means query execution failed. The logs for adhoc worker will show more about what went wrong.

You can do this by SSH’ing into the server:

$ sudo docker-compose logs -f --tail 100 adhoc_worker

Then execute the query in your browser and watch the logs for a stack trace.

Nir · November 25, 2021, 7:02pm

There isn’t any stacktrace.

also looking for the failed run didn’t show anything

root@ip-x-x-x-x:/opt/redash# docker logs --tail 1000 redash_adhoc_worker_1 &> log.log
root@ip-x-x-x-x:/opt/redash# grep “8b579138-cef8-456d-9e81-ab9ab4a5dc7b” log.log

Nir · November 25, 2021, 7:04pm

wait, i miss spoke

adhoc_worker_1 | [2021-11-25 19:03:33,420][PID:2416][WARNING][rq.job.redash.tasks.queries.execution] job.func_name=redash.tasks.queries.execution.execute_query job.id=edb92cb8-6ab2-4d40-8180-e9396dc04584 Unexpected error while running query:
adhoc_worker_1 | Traceback (most recent call last):
adhoc_worker_1 | File “/app/redash/tasks/queries/execution.py”, line 188, in run
adhoc_worker_1 | data, error = query_runner.run_query(annotated_query, self.user)
adhoc_worker_1 | File “/app/redash/query_runner/mysql.py”, line 168, in run_query
adhoc_worker_1 | while not ev.wait(1):
adhoc_worker_1 | File “/usr/local/lib/python3.7/threading.py”, line 552, in wait
adhoc_worker_1 | signaled = self._cond.wait(timeout)
adhoc_worker_1 | File “/usr/local/lib/python3.7/threading.py”, line 300, in wait
adhoc_worker_1 | gotit = waiter.acquire(True, timeout)
adhoc_worker_1 | File “/app/redash/tasks/queries/execution.py”, line 129, in signal_handler
adhoc_worker_1 | raise InterruptException
adhoc_worker_1 | redash.query_runner.InterruptException

Nir · November 25, 2021, 7:26pm

interesting thing is that I tried it a few more times and did not get stack trace for those failures.

Thanks,
Nir Ben Ezri
DBA Team Lead
DistroKid
https://distrokid.com

afridland · November 25, 2021, 8:02pm

Also on some queries, I’ve noticed following warning, but it happens only sometimes

cursor.execute(query)
/app/redash/query_runner/mysql.py:181: Warning: (1292, “Truncated incorrect DOUBLE value: ‘646030 13627’”)
cursor.execute(query)
/app/redash/query_runner/mysql.py:181: Warning: (1292, “Truncated incorrect DOUBLE value: ‘646030 13798’”)
cursor.execute(query)
/app/redash/query_runner/mysql.py:181: Warning: (1292, “Truncated incorrect DOUBLE value: ‘646030 14970’”)
cursor.execute(query)
/app/redash/query_runner/mysql.py:181: Warning: (1292, “Truncated incorrect DOUBLE value: ‘646030 15302’”)
cursor.execute(query)