Internal server error: "OperationalError: server closed the connection unexpectedly" on SQL: 'SELECT ... FROM organizations'

From time to time the UI start getting “internal server error” after a couple of refreshes it is start working again normally. This happens on every screen: login, dashboard, queries, etc… I can see in the log that redash-server is trying to make the following SQL query:

[SQL: 'SELECT organizations.updated_at AS organizations_updated_at, organizations.created_at AS organizations_created_at, organizations.id AS organizations_id, organizations.name AS organizations_name, organizations.slug AS organizations_slug, organizations.settings AS organizations_settings \nFROM organizations \nWHERE organizations.slug = %(slug_1)s \n LIMIT %(param_1)s'] [parameters: {'slug_1': 'default', 'param_1': 1}] (Background on this error at: http://sqlalche.me/e/e3q8)

But it fails with OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly

I have selfhosted redash with the following setup:

  • standalone postgre machine
  • standalone redis machine
  • docker redash-server: redash/redash:8.0.0.b32245
  • docker redash-scheduler: redash/redash:8.0.0.b32245
  • docker redash-worker: redash/redash:8.0.0.b32245
  • docker ad-hoc-worker: redash/redash:8.0.0.b32245

The error stacktrace is:

Feb 22, 2020 @ 10:19:44.908	    cursor.execute(statement, parameters)
Feb 22, 2020 @ 10:19:44.908	  File "/usr/local/lib/python2.7/site-packages/flask_login/login_manager.py", line 317, in _load_user
Feb 22, 2020 @ 10:19:44.908		This probably means the server terminated abnormally
Feb 22, 2020 @ 10:19:44.908		before or while processing the request.
Feb 22, 2020 @ 10:19:44.908	 [SQL: 'SELECT organizations.updated_at AS organizations_updated_at, organizations.created_at AS organizations_created_at, organizations.id AS organizations_id, organizations.name AS organizations_name, organizations.slug AS organizations_slug, organizations.settings AS organizations_settings \nFROM organizations \nWHERE organizations.slug = %(slug_1)s \n LIMIT %(param_1)s'] [parameters: {'slug_1': 'default', 'param_1': 1}] (Background on this error at: http://sqlalche.me/e/e3q8)
Feb 22, 2020 @ 10:19:44.908	    response = self.full_dispatch_request()
Feb 22, 2020 @ 10:19:44.908	  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
Feb 22, 2020 @ 10:19:44.908	    exc_info
Feb 22, 2020 @ 10:19:44.908	  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 265, in raise_from_cause
Feb 22, 2020 @ 10:19:44.908	    context)
Feb 22, 2020 @ 10:19:44.908	  File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1982, in wsgi_app
Feb 22, 2020 @ 10:19:44.908	    reraise(type(exception), exception, tb=exc_tb, cause=cause)
Feb 22, 2020 @ 10:19:44.908	  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
Feb 22, 2020 @ 10:19:44.908	  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 509, in do_execute
Feb 22, 2020 @ 10:19:44.908	OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
Feb 22, 2020 @ 10:19:44.907	Traceback (most recent call last):
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1609, in full_dispatch_request
Feb 22, 2020 @ 10:19:44.907	    user = self.user_callback(user_id)
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1614, in full_dispatch_request
Feb 22, 2020 @ 10:19:44.907	    request_started.send(self)
Feb 22, 2020 @ 10:19:44.907	  File "/app/redash/authentication/__init__.py", line 48, in load_user
Feb 22, 2020 @ 10:19:44.907	    org = current_org._get_current_object()
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/werkzeug/local.py", line 302, in _get_current_object
Feb 22, 2020 @ 10:19:44.907	    return self.__local()
Feb 22, 2020 @ 10:19:44.907	  File "/app/redash/authentication/org_resolving.py", line 18, in _get_current_org
Feb 22, 2020 @ 10:19:44.907	    g.org = Organization.get_by_slug(slug)
Feb 22, 2020 @ 10:19:44.907	  File "/app/redash/models/organizations.py", line 33, in get_by_slug
Feb 22, 2020 @ 10:19:44.907	    return cls.query.filter(cls.slug == slug).first()
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2895, in first
Feb 22, 2020 @ 10:19:44.907	    ret = list(self[0:1])
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2687, in __getitem__
Feb 22, 2020 @ 10:19:44.907	    return list(res)
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2995, in __iter__
Feb 22, 2020 @ 10:19:44.907	    return self._execute_and_instances(context)
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 3018, in _execute_and_instances
Feb 22, 2020 @ 10:19:44.907	    result = conn.execute(querycontext.statement, self._params)
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 948, in execute
Feb 22, 2020 @ 10:19:44.907	    return meth(self, multiparams, params)
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
Feb 22, 2020 @ 10:19:44.907	    return connection._execute_clauseelement(self, multiparams, params)
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
Feb 22, 2020 @ 10:19:44.907	    compiled_sql, distilled_params
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
Feb 22, 2020 @ 10:19:44.907	    context)
Feb 22, 2020 @ 10:19:44.907	    rv = self.handle_user_exception(e)
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/blinker/base.py", line 267, in send
Feb 22, 2020 @ 10:19:44.907	    for receiver in self.receivers_for(sender)]
Feb 22, 2020 @ 10:19:44.907	  File "/app/redash/models/users.py", line 54, in update_user_active_at
Feb 22, 2020 @ 10:19:44.907	    if current_user.is_authenticated and not current_user.is_api_user():
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/werkzeug/local.py", line 343, in __getattr__
Feb 22, 2020 @ 10:19:44.907	    return getattr(self._get_current_object(), name)
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/werkzeug/local.py", line 302, in _get_current_object
Feb 22, 2020 @ 10:19:44.907	    return self.__local()
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/flask_login/utils.py", line 26, in <lambda>
Feb 22, 2020 @ 10:19:44.907	    current_user = LocalProxy(lambda: _get_user())
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/flask_login/utils.py", line 302, in _get_user
Feb 22, 2020 @ 10:19:44.907	    current_app.login_manager._load_user()
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1517, in handle_user_exception
Feb 22, 2020 @ 10:19:44.907	    return self.reload_user()
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/flask_restful/__init__.py", line 271, in error_router
Feb 22, 2020 @ 10:19:44.907	    return original_handler(e)
Feb 22, 2020 @ 10:19:44.907	[2020-02-22 08:19:44,905] ERROR in app: Exception on /api/organization/status [GET]
Feb 22, 2020 @ 10:19:44.907	    reraise(exc_type, exc_value, tb)
Feb 22, 2020 @ 10:19:44.907	  File "/usr/local/lib/python2.7/site-packages/flask_login/login_manager.py", line 279, in reload_user

However thank you for this great tool ! Appreciate your work :slight_smile:

I’m not familiar with python but I guess that you are missing ping option for the DB connections. After a little research and looking in your code I think that this will fix the issue:

class SQLAlchemy(_BaseSQLAlchemy):
    def apply_pool_defaults(self, app, options):
        super(SQLAlchemy, self).apply_pool_defaults(self, app, options)
        options["pool_pre_ping"] = True

your setup is:

def apply_pool_defaults(self, app, options):
        super(RedashSQLAlchemy, self).apply_pool_defaults(app, options)
        if settings.SQLALCHEMY_DISABLE_POOL:
            options["poolclass"] = NullPool
            # Remove options NullPool does not support:
            options.pop("max_overflow", None)

The options["pool_pre_ping"] = True should resolve the issue. I have never used python so it will be very hard for me to test this and make pull request sorry :frowning:

Documentation about this option: https://docs.sqlalchemy.org/en/13/core/pooling.html#sqlalchemy.pool.Pool.params.pre_ping

pre_ping –

if True, the pool will emit a “ping” (typically “SELECT 1”, but is dialect-specific) on the connection upon checkout, to test if the connection is alive or not. If not, the connection is transparently re-connected and upon success, all other pooled connections established prior to that timestamp are invalidated. Requires that a dialect is passed as well to interpret the disconnection error.

I think that this will resolve the issue :slight_smile:

Anyone else seeing this issue? For me it seems to appear when user has idled enough long time (30 minutes maybe) and then trying to change a Redash view.

I haven’t yet tried to add pool_pre_ping to see if it would be solving the issue.

Yes. I have tested it and it is working. It is in the master brach already. To use it set environment variable like this SQLALCHEMY_ENABLE_POOL_PRE_PING: true

This is the pull request from github that fixes the issue: https://github.com/getredash/redash/pull/4741

1 Like

I will test that setting in our swarm setup and report back if it cures our issue.

1 Like

Enabling that setting seems to have cured all the internal server errors that were happening after idle periods.