Redash version - Re:dash 0.11.1+b2095
Hosted on our own AWS environment with PostgreSQL RDS as Redash metadata (t2.micro)

Today morning, I was not able to login to redash Dashboards/queries. It intermittently stopped and after some time it never loaded the page.
I checked the logs. This is what I saw

[2016-09-30 10:36:09 +0000] [1216] [ERROR] Exception in worker process: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/gunicorn/arbiter.py", line 515, in spawn_worker worker.init_process() File "/usr/local/lib/python2.7/dist-packages/gunicorn/workers/base.py", line 122, in init_process self.load_wsgi() File "/usr/local/lib/python2.7/dist-packages/gunicorn/workers/base.py", line 130, in load_wsgi self.wsgi = self.app.wsgi() File "/usr/local/lib/python2.7/dist-packages/gunicorn/app/base.py", line 67, in wsgi self.callable = self.load() File "/usr/local/lib/python2.7/dist-packages/gunicorn/app/wsgiapp.py", line 65, in load return self.load_wsgiapp() File "/usr/local/lib/python2.7/dist-packages/gunicorn/app/wsgiapp.py", line 52, in load_wsgiapp return util.import_app(self.app_uri) File "/usr/local/lib/python2.7/dist-packages/gunicorn/util.py", line 357, in import_app __import__(module) File "/opt/redash/redash.0.11.1.b2095/redash/__init__.py", line 68, in <module> reset_new_version_status() File "/opt/redash/redash.0.11.1.b2095/redash/version_check.py", line 34, in reset_new_version_status latest_version = get_latest_version() File "/opt/redash/redash.0.11.1.b2095/redash/version_check.py", line 40, in get_latest_version return redis_connection.get(REDIS_KEY) File "/usr/local/lib/python2.7/dist-packages/redis/client.py", line 880, in get return self.execute_command('GET', name) File "/usr/local/lib/python2.7/dist-packages/redis/client.py", line 579, in execute_command return self.parse_response(connection, command_name, **options) File "/usr/local/lib/python2.7/dist-packages/redis/client.py", line 585, in parse_response response = connection.read_response() File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 577, in read_response response = self._parser.read_response() File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 255, in read_response raise error BusyLoadingError: Redis is loading the dataset in memory

I saw these few times and after that I started seeing below error.

[2016-09-30 10:42:15,821][PID:1249][ERROR][redash] Exception on /scripts/plugins.978f5b30.js [GET] Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1817, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1477, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python2.7/dist-packages/flask_restful/__init__.py", line 271, in error_router return original_handler(e) File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1381, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1473, in full_dispatch_request rv = self.preprocess_request() File "/usr/local/lib/python2.7/dist-packages/flask/app.py", line 1666, in preprocess_request rv = func() File "/opt/redash/redash.0.11.1.b2095/redash/models.py", line 42, in connect_db self.database.connect() File "/usr/local/lib/python2.7/dist-packages/peewee.py", line 3004, in connect self.initialize_connection(self.__local.conn) File "/usr/local/lib/python2.7/dist-packages/peewee.py", line 2922, in __exit__ reraise(new_type, new_type(*exc_value.args), traceback) File "/usr/local/lib/python2.7/dist-packages/peewee.py", line 3002, in connect **self.connect_kwargs) File "/usr/local/lib/python2.7/dist-packages/playhouse/postgres_ext.py", line 373, in _connect conn = super(PostgresqlExtDatabase, self)._connect(database, **kwargs) File "/usr/local/lib/python2.7/dist-packages/peewee.py", line 3316, in _connect conn = psycopg2.connect(database=database, **kwargs) File "/usr/local/lib/python2.7/dist-packages/psycopg2/__init__.py", line 164, in connect conn = _connect(dsn, connection_factory=connection_factory, async=async) OperationalError: FATAL: remaining connection slots are reserved for non-replication superuser connections

I looked upon internet and it seems
remaining connection slots are reserved for non-replication superuser connections
error shows up when you reach max_connections in Database.

So, I logged on to AWS Console to verify that, but connection were 0. But memory usage was reaching it’s threshold.

So I had to restart my redash application instance and RDS instance to make that work.

I have few questions regarding this issue. It would be great if you could help:-

  1. What could be the reason for high memory usage of redash metadata instance? Is it because lot of queries were trying to fetch results from my database in parallel(My data resides in Redshift) ?
  2. In what scenario could redash metadata instance, reach maximum connections? Is it based on number of users?
  3. How can I debug more to find the root cause of this issue?

Let me know if you need any more information.

Update :-

  1. It seems the issue came because of high number of ReadIOPS to metadata RDS.
    Still trying to figure out what triggered the high number of ReadIOPS. Not sure what triggered it.
    Could it be number of parallel connections/users logging in?

  2. I am not able to query events table. Its just never returns result.

  3. I still see high number of WriteIOPS

How many users do you have?

There are around 12-15 user accounts. But, not everybody is using it actively as of now.
Also, I was able to query events table now. It seems normal now.