Issue Summary
When going in the Admin section on the Celery Status tab, randomly it displays “Failed loading status. Please refresh.”. Sometimes, after a reload it works again, but after redash is left unattended for a couple of days, reloading doesn’t help, only restarting the scheduler or redis makes the view available again.
Technical details:
- Redash Version: redash/redash:7.0.0.b18042
- Browser/OS: Chrome 75.0.3770.100 / macOS 10.14.5 (18F132)
- How did you install Redash:
Migrated Redash 3.0.0+b3134 local install on 16.04.4 LTS (Xenial Xerus) to Docker 7.0.0.b18042 on CoreOS stable.
The migration is in fact a parallel run. I restored the 3.0.0 pgsql database backup into AWS RDS, then executed the migrations in the following order using a docker setup:
- redash/redash:3.0.0.b3147
- redash/redash:4.0.2.b4720
- redash/redash:5.0.0.b4754
- redash/redash:6.0.0.b8537
- redash/redash:7.0.0.b18042
Everything seems successful and queries (scheduled alike) run perfectly fine. Occasionally the message will pop up and https://redash.my.domain/admin/queries/tasks returns a 500.
Server log
[2019-07-25 08:52:59,827][PID:14][INFO][metrics] method=GET path=/admin/queries/tasks endpoint=redash_index status=304 content_type=text/html; charset=utf-8 content_length=926 duration=0.64 query_count=2 query_duration=4.27
[2019-07-25 08:53:00,768][PID:14][INFO][metrics] method=GET path=/api/session endpoint=redash_session status=200 content_type=application/json content_length=1331 duration=3.59 query_count=3 query_duration=6.02
[2019-07-25 08:53:00,880][PID:14][INFO][metrics] method=GET path=/api/organization/status endpoint=redash_organization_status status=200 content_type=application/json content_length=100 duration=33.92 query_count=7 query_duration=16.14
[2019-07-25 08:53:00,892][PID:14][INFO][metrics] method=GET path=/static/images/favicon-32x32.png endpoint=static status=200 content_type=image/png content_length=2005 duration=0.57 query_count=2 query_duration=4.66
[2019-07-25 08:53:01,000][PID:14][INFO][metrics] method=GET path=/api/dashboards/favorites endpoint=dashboard_favorites status=200 content_type=application/json content_length=55 duration=17.37 query_count=4 query_duration=10.72
[2019-07-25 08:53:01,036][PID:14][INFO][metrics] method=GET path=/api/queries/favorites endpoint=query_favorites status=200 content_type=application/json content_length=55 duration=22.94 query_count=4 query_duration=12.73
[2019-07-25 08:53:01,162] ERROR in app: Exception on /api/admin/queries/tasks [GET]
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1988, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1641, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python2.7/dist-packages/flask_restful/__init__.py", line 271, in error_router
return original_handler(e)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1544, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1639, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1625, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/app/redash/permissions.py", line 48, in decorated
return fn(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/flask_login/utils.py", line 228, in decorated_view
return func(*args, **kwargs)
File "/app/redash/handlers/admin.py", line 51, in queries_tasks
'tasks': celery_tasks(),
File "/app/redash/monitor.py", line 132, in celery_tasks
tasks = parse_tasks(celery.control.inspect().active(), 'active')
File "/usr/local/lib/python2.7/dist-packages/celery/app/control.py", line 108, in active
return self._request('active')
File "/usr/local/lib/python2.7/dist-packages/celery/app/control.py", line 95, in _request
timeout=self.timeout, reply=True,
File "/usr/local/lib/python2.7/dist-packages/celery/app/control.py", line 454, in broadcast
limit, callback, channel=channel,
File "/usr/local/lib/python2.7/dist-packages/kombu/pidbox.py", line 321, in _broadcast
channel=chan)
File "/usr/local/lib/python2.7/dist-packages/kombu/pidbox.py", line 360, in _collect
self.connection.drain_events(timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line 301, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/virtual/base.py", line 963, in drain_events
get(self._deliver, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/redis.py", line 366, in get
ret = self.handle_event(fileno, event)
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/redis.py", line 348, in handle_event
return self.on_readable(fileno), self
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/redis.py", line 344, in on_readable
chan.handlers[type]()
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/redis.py", line 721, in _brpop_read
**options)
File "/usr/local/lib/python2.7/dist-packages/redis/client.py", line 768, in parse_response
response = connection.read_response()
File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 636, in read_response
raise e
ConnectionError: Error while reading from socket: (104, 'Connection reset by peer')
[2019-07-25 08:53:01,162][PID:14][ERROR][redash] Exception on /api/admin/queries/tasks [GET]
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1988, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1641, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python2.7/dist-packages/flask_restful/__init__.py", line 271, in error_router
return original_handler(e)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1544, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1639, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1625, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/app/redash/permissions.py", line 48, in decorated
return fn(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/flask_login/utils.py", line 228, in decorated_view
return func(*args, **kwargs)
File "/app/redash/handlers/admin.py", line 51, in queries_tasks
'tasks': celery_tasks(),
File "/app/redash/monitor.py", line 132, in celery_tasks
tasks = parse_tasks(celery.control.inspect().active(), 'active')
File "/usr/local/lib/python2.7/dist-packages/celery/app/control.py", line 108, in active
return self._request('active')
File "/usr/local/lib/python2.7/dist-packages/celery/app/control.py", line 95, in _request
timeout=self.timeout, reply=True,
File "/usr/local/lib/python2.7/dist-packages/celery/app/control.py", line 454, in broadcast
limit, callback, channel=channel,
File "/usr/local/lib/python2.7/dist-packages/kombu/pidbox.py", line 321, in _broadcast
channel=chan)
File "/usr/local/lib/python2.7/dist-packages/kombu/pidbox.py", line 360, in _collect
self.connection.drain_events(timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line 301, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/virtual/base.py", line 963, in drain_events
get(self._deliver, timeout=timeout)
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/redis.py", line 366, in get
ret = self.handle_event(fileno, event)
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/redis.py", line 348, in handle_event
return self.on_readable(fileno), self
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/redis.py", line 344, in on_readable
chan.handlers[type]()
File "/usr/local/lib/python2.7/dist-packages/kombu/transport/redis.py", line 721, in _brpop_read
**options)
File "/usr/local/lib/python2.7/dist-packages/redis/client.py", line 768, in parse_response
response = connection.read_response()
File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 636, in read_response
raise e
ConnectionError: Error while reading from socket: (104, 'Connection reset by peer')
[2019-07-25 08:53:01,163][PID:14][INFO][metrics] method=GET path=/api/admin/queries/tasks endpoint=redash_queries_tasks status=500 content_type=? content_length=-1 duration=14.79 query_count=3 query_duration=6.03
[2019-07-25 08:53:02,069][PID:14][INFO][metrics] method=POST path=/api/events endpoint=events status=200 content_type=application/json content_length=4 duration=2.40 query_count=2 query_duration=4.86
This seems like a Redis connection issue, but the Redis container is up, and the redis debug logs don’t show anything in particular. I even set --tcp-timeout 0 to be sure.
For reference, I am using a compose file based on the one provided in the setup instructions with some modifications:
- v3.7, deployed in a single node docker swarm
- Workers use replicas instead of WORKERS_COUNT processes per container
- Entrypoint/command had to be hijacked in order to pip install ldap3
- The stack network (redashnet) driver is overlay as swarm doesn’t support bridge
- Front end is reverse proxied by Traefik with TLS termination, which is in another stack on the swarmnet network
Compose file
version: '3.7'
x-redash-service: &redash-service
# This reflects the order DB migrations have been applied from version 3
#image: redash/redash:3.0.0.b3147
#image: redash/redash:4.0.2.b4720
#image: redash/redash:5.0.0.b4754
#image: redash/redash:6.0.0.b8537
image: redash/redash:7.0.0.b18042
depends_on:
- redis
env_file: /etc/redash.env
services:
server:
<<: *redash-service
entrypoint: [bash]
command: [-c, pip install ldap3 && /app/bin/docker-entrypoint server]
# The below command is to be used in replacement of the above for DB schema upgrades
#command: [-c, pip install ldap3 && /app/bin/docker-entrypoint manage db upgrade]
networks:
- swarmnet
- redashnet
ports:
- 5000:5000
environment:
REDASH_WEB_WORKERS: 4
deploy:
replicas: 1
labels:
- traefik.enable=true
- traefik.metrics.port=5000
- traefik.metrics.frontend.rule=Host:${HOSTNAME}
scheduler:
<<: *redash-service
entrypoint: [bash]
command: [-c, pip install ldap3 && /app/bin/docker-entrypoint scheduler]
networks:
- redashneta
environment:
QUEUES: "celery"
WORKERS_COUNT: 1
deploy:
replicas: 1
scheduled-worker:
<<: *redash-service
entrypoint: [bash]
command: [-c, pip install ldap3 && /app/bin/docker-entrypoint worker]
networks:
- redashnet
environment:
QUEUES: "scheduled_queries,schemas"
WORKERS_COUNT: 1
deploy:
replicas: 2
adhoc-worker:
<<: *redash-service
entrypoint: [bash]
command: [-c, pip install ldap3 && /app/bin/docker-entrypoint worker]
networks:
- redashnet
environment:
QUEUES: "queries"
WORKERS_COUNT: 1
deploy:
replicas: 1
redis:
image: redis:5.0-alpine
command: redis-server --tcp-timeout 0 --loglevel verbose
networks:
- redashnet
networks:
swarmnet:
external: true
name: swarmnet
redashnet:
name: redashnet
Redash environment file
# GENERAL
REDASH_REDIS_URL=redis://redis:6379/0
REDASH_HOST=https://redash.my.domain/
REDASH_LOG_LEVEL=DEBUG
REDASH_DATABASE_URL=postgresql://redash:redacted@redash-db.my.domain/redash
REDASH_COOKIE_SECRET=REDACTED
# LDAP
REDASH_PASSWORD_LOGIN_ENABLED=false
REDASH_LDAP_LOGIN_ENABLED=true
REDASH_LDAP_URL=ldaps://ldap.my.domain
REDASH_LDAP_BIND_DN=uid=redash,cn=sysaccounts,cn=etc,dc=domain,dc=my
REDASH_LDAP_BIND_DN_PASSWORD=REDACTED
REDASH_SEARCH_DN=cn=users,cn=accounts,dc=domain,dc=my
REDASH_LDAP_SEARCH_TEMPLATE=(uid=%(username)s)
How could I troubleshoot this deeper ?
Thanks in advance