Worker timeout running redash on kubernetes

Hello all,

I’m migrating redash to k8s and I’m having some issues with the webserver whenever I try to enter the site, here is the error it throws

│ [2019-12-26 15:39:27 +0000] [1] [INFO] Starting gunicorn 19.7.1                                                                                                                 │
│ [2019-12-26 15:39:27 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)                                                                                                    │
│ [2019-12-26 15:39:27 +0000] [1] [INFO] Using worker: sync                                                                                                                       │
│ [2019-12-26 15:39:27 +0000] [11] [INFO] Booting worker with pid: 11                                                                                                             │
│ [2019-12-26 15:39:27,878][PID:11][INFO][root] Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt                                                             │
│ [2019-12-26 15:39:27,904][PID:11][INFO][root] Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt                                                      │
│ [2019-12-26 15:40:11,341][PID:11][INFO][metrics] method=GET path=/ endpoint=redash_index status=302 content_type=text/html; charset=utf-8 content_length=313 duration=0.96 quer │
│ y_count=0 query_duration=0.00                                                                                                                                                   │
│ [2019-12-26 15:40:42 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:11)                                                                                                              │
│ [2019-12-26 15:40:43 +0000] [19] [INFO] Booting worker with pid: 19                                                                                                             │
│ [2019-12-26 15:40:43,958][PID:19][INFO][root] Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt                                                             │
│ [2019-12-26 15:40:43,993][PID:19][INFO][root] Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt

Funny thing is when I run it on my local machine it works just fine. I have no clue why it’s not working on the cluster (There are other things running there without issues).

Any pointers would be greatly appreciated :slight_smile:

Technical details:

  • Redash Version: 6.0.0.b8537
  • How did you install Redash: K8s

Deployments and service I made: https://pastebin.com/VUT5cpaX

I’m using one scheduler and 2 workers, one for ad-hoc queries and the other for scheduled queries and schema refresh.

Happy Holidays!

How much memory/CPU did you allocate it?

The namespace has no memory/cpu limits, so I guess that’s up to kubernetes to determine. Here is the output of the resource usage set by k8s.

│ NAME                                        READY    STATUS           RS    CPU    MEM
│ redash-ad-hoc-worker-5bc8647fc7-c97jl       1/1      Running           0      2    215
│ redash-scheduled-worker-7746898969-pjkcm    1/1      Running           0      2    231
│ redash-scheduler-6c6db79b45-s8clq           1/1      Running           0      2    199
│ redash-server-7b948f988c-hd4wh              1/1      Running           0    136    150

On my local cluster the server gets roughly the same amount of memory.

Is this actual usage or the limitation K8s imposed?

Actual usage, those numbers vary a bit but without any requests being made that’s the usage I’m getting.

This is the redirection I get when it fails: https://redash.stg.me.net/login?next=https%3A%2F%2Fredash.stg.me.net%2F

Weird thing, I just tried with https://redash.stg.me.net/ping and it returned pong without errors

│ [2020-01-03 15:19:46,821][PID:145][INFO][metrics] method=GET path=/ping endpoint=redash_ping status=200 content_type=text/html; charset=utf-8 content_length=5 duration=0.31 qu │
│ ery_count=0 query_duration=0.00                                                                                                                                                 │
│ [2020-01-03 15:19:47,176][PID:145][INFO][metrics] method=GET path=/favicon.ico endpoint=redash_index status=302 content_type=text/html; charset=utf-8 content_length=335 durati │
│ on=0.88 query_count=0 query_duration=0.00

https://redash.stg.me.net/status.json is not working

@arikfr The problem is solved, it was a dingdong on my part. The cluster didn’t have permission to access the database running on RDS :sweat_smile:

Thanks for the help though!