Issue Summary

A have used redash/redash docker image and created a pod in kubernetes and two other pods for redis and postgres. after adding two sources all the connections to data sources failes with 500 status

here is the log of rq healthcheck command:
Traceback (most recent call last):
File “manage.py”, line 9, in
manager()
File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 722, in call
return self.main(*args, **kwargs)
File “/usr/local/lib/python3.7/site-packages/flask/cli.py”, line 586, in main
return super(FlaskGroup, self).main(*args, **kwargs)
File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 697, in main
rv = self.invoke(ctx)
File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 535, in invoke
return callback(*args, **kwargs)
File “/usr/local/lib/python3.7/site-packages/click/decorators.py”, line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File “/usr/local/lib/python3.7/site-packages/flask/cli.py”, line 426, in decorator
return __ctx.invoke(f, *args, **kwargs)
File “/usr/local/lib/python3.7/site-packages/click/core.py”, line 535, in invoke
return callback(*args, **kwargs)
File “/app/redash/cli/rq.py”, line 100, in healthcheck
“worker_healthcheck”, “worker”, None, [(WorkerHealthcheck, {})]
File “/usr/local/lib/python3.7/site-packages/supervisor_checks/check_runner.py”, line 62, in init
self._rpc_client = childutils.getRPCInterface(self._environment)
File “/usr/local/lib/python3.7/site-packages/supervisor/childutils.py”, line 21, in getRPCInterface
return xmlrpclib.ServerProxy(‘http://127.0.0.1’, getRPCTransport(env))
File “/usr/local/lib/python3.7/site-packages/supervisor/childutils.py”, line 15, in getRPCTransport
return SupervisorTransport(u, p, env[‘SUPERVISOR_SERVER_URL’])
File “/usr/local/lib/python3.7/os.py”, line 681, in getitem
raise KeyError(key) from None
KeyError: ‘SUPERVISOR_SERVER_URL’

what is supervisor server url?

and here is the log of test connection:
[2021-10-20 09:08:26 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:11)
[2021-10-20 09:08:26,843][PID:11][INFO][metrics] method=POST path=/api/data_sources/1/test endpoint=datasourcetestresource status=500 content_type=? content_length=-1 duration=30468.55 query_count=4 query_duration=17.50
[2021-10-20 09:08:26 +0000] [11] [INFO] Worker exiting (pid: 11)
[2021-10-20 09:08:27 +0000] [359] [INFO] Booting worker with pid: 359

Technical details:

  • Redash Version: 10.0.0
  • Browser/OS: linux
  • How did you install Redash: from docker image in kubernetes cluster

and when i go to query tab i see this log:

Traceback (most recent call last):
File “/usr/local/lib/python3.7/site-packages/flask/app.py”, line 1949, in full_dispatch_request
rv = self.dispatch_request()
File “/usr/local/lib/python3.7/site-packages/flask/app.py”, line 1935, in dispatch_request
return self.view_functionsrule.endpoint
File “/usr/local/lib/python3.7/site-packages/flask_restful/init.py”, line 458, in wrapper
resp = resource(*args, **kwargs)
File “/usr/local/lib/python3.7/site-packages/flask_login/utils.py”, line 261, in decorated_view
return func(*args, **kwargs)
File “/usr/local/lib/python3.7/site-packages/flask/views.py”, line 89, in view
return self.dispatch_request(*args, **kwargs)
File “/app/redash/handlers/base.py”, line 33, in dispatch_request
return super(BaseResource, self).dispatch_request(*args, **kwargs)
File “/usr/local/lib/python3.7/site-packages/flask_restful/init.py”, line 573, in dispatch_request
resp = meth(*args, **kwargs)
File “/app/redash/handlers/query_results.py”, line 462, in get
job = Job.fetch(job_id)
File “/usr/local/lib/python3.7/site-packages/rq/job.py”, line 299, in fetch
job.refresh()
File “/usr/local/lib/python3.7/site-packages/rq/job.py”, line 518, in refresh
raise NoSuchJobError(‘No such job: {0}’.format(self.key))
rq.exceptions.NoSuchJobError: No such job: b’rq:job:2417800e-2736-4368-9d8a-086b415fedd1’
[2021-10-20 09:13:34,503][PID:10][INFO][metrics] method=GET path=/api/jobs/2417800e-2736-4368-9d8a-086b415fedd1 endpoint=job status=500 content_type=application/json content_length=36 duration=8.44 query_count=2 query_duration=7.5

and see this on test datasource:

Testing connection to data source: torob (id=1)
Failure: HTTPSConnectionPool(host=‘api.appmetrica.yandex.com’, port=443): Max retries exceeded with url: /management/v1/applications (Caused by NewConnectionError(’<urllib3.connection.VerifiedHTTPSConnection object at 0x7fc4391f8fd0>: Failed to establish a new connection: [Errno 110] Connection timed out’))

We don’t have official docs for deploying on K8S and I’m not an expert. I think the issue is you have multiple instances of Redis / Postgres running and the workers are confused. This is why you see JobNotFoundError. I would try reducing to a single Redis instance and see what happens.

Also you can search the forum. We tag any posts related to kubernetes. Hopefully we can add first-class documentation for how to deploy this way in the future :crossed_fingers: