500 Response and Timeout of Worker when Testing Connection

Issue Summary

500 error code and worker timeout when testing connection

Technical details:

  • Redash Version: Version: 9.0.0-beta (8d548ecb)
  • Browser/OS: Windows 10
  • How did you install Redash: Docker-compose

Hi,

We have a self hosted instance of v9 redash set up on an internal docker host. However, when trying to test the connection of any MSSQL (not ODBC) and prometheus data sources (I have not tried other sources) I get the following output after 30 seconds:

server_1        | [2021-01-22 10:08:24 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:74)
server_1        | [2021-01-22 10:08:24,894][PID:74][INFO][metrics] method=POST path=/api/data_sources/1/test endpoint=datasourcetestresource status=500 content_type=? content_length=-1 duration=30681.67 query_count=4 query_duration=7.25
server_1        | [2021-01-22 10:08:24 +0000] [74] [INFO] Worker exiting (pid: 74)
server_1        | [2021-01-22 10:08:25 +0000] [86] [INFO] Booting worker with pid: 86

I noticed a similar topic raised with the same output, but this used kubernetes and was a result of ingress ([TEST CONNECTION] api/data_sources/9/test - bad gateway) but as a newcomer to docker in general I don’t really understand what the equivalent is in my case.

I have tried the following:

  1. taking a local version and trying from there (no dice)
  2. removing our LDAP layer (no dice)
  3. pinging the sql server’s IP from each container locally and in dockerhost (this worked for all containers)
  4. Tested the parameters I entered with pymssql in python 3 (this worked)

This is the partially redacted docker-compose we are using, is there anything funky with it that anyone can see? Or is there anything obvious I’m missing?

version: "2"
x-redash-service: &redash-service
  build: .
  depends_on:
- postgres
- redis
  env_file: ./.env
services:
  server:
<<: *redash-service
command: server
ports:
  - "5000:5000"
environment:
  - REDASH_WEB_WORKERS=2
  - REDASH_LDAP_LOGIN_ENABLED=true
  - REDASH_LDAP_URL=REDACTED
  - REDASH_LDAP_BIND_DN=CN=LDAP_Redash,OU=Service Accounts,DC=REDACT,DC=NET
  - REDASH_LDAP_BIND_DN_PASSWORD=REDACTED
  - REDASH_LDAP_SEARCH_DN=OU=Sites,DC=REDACTED,DC=NET
  - REDASH_LDAP_SEARCH_TEMPLATE=(sAMAccountName=%(username)s)
  - REDASH_LDAP_DISPLAY_NAME_KEY=cn
  - REDASH_PASSWORD_LOGIN_ENABLED=false
  scheduler:
<<: *redash-service
command: scheduler
  worker:
<<: *redash-service
command: worker
environment:
  - QUEUES="periodic emails default"
  - WORKERS_COUNT=1
  query_worker:
<<: *redash-service
command: worker
environment:
  - QUEUES="scheduled_queries schema queries"
  - WORKERS_COUNT=1
  redis:
image: redis:5.0-alpine
restart: always
  postgres:
image: postgres:9.6-alpine
environment:
  - POSTGRES_HOST_AUTH_METHOD=trust
volumes:
  - /data/redash/postgres:/var/lib/postgresql/data
restart: always
  nginx:
image: redash/nginx:latest
ports:
  - "8901:80"
depends_on:
  - server
links:
  - server:redash
restart: always

Thanks!

Did you do this from within one of the Docker containers?

I know you mentioned that you can ping the database server from within a container. But that doesn’t necessarily mean that database traffic can pass through the firewall because ping and myssql run on different ports.

Hi! Thanks for the response.

I did not do that initially, no, I have since your message tried running a python script using pymssql from within redash_server_1 container, fetching 1 row from the database I’m trying to connect, this has worked:

Since the error originated from server_1 I wasnt sure if this was the best place to do it from, should I try the others? Is Nginx causing a complication here?

I can confirm this is also the case for the worker and query_worker containers in the docker-compose above

Wasnt able to find the solution to this, but remade the redash instance from scratch starting from v8. I figure it was due to whoever set up our instance not force recreating the image, so it was using v8 image with a v9 docker-compose. Would recommend that anyone experiencing this should remake their v8 instance, then retry the upgrade steps with backups at each step

1 Like