I am using pre-baked AMI based on the Docs.

Upon any issue, when I SSH into the box and restart the box based on below documentation, it never succeeds. It is the case even with a brand new EC2 instance.

https://redash.io/help-onpremise/maintenance/ongoing-maintenance-and-basic-operations.html

Has anyone faced this issue ? Can help ?

What instance size did you use?

us-west-2: ami-670cc507

EC2 instance - t2.small

What happens when you try to restart? What error message do you get? What command do you try to run?

This is command and response:

$ sudo supervisorctl restart all
redash_server: ERROR (abnormal termination)
redash_celery_scheduled: ERROR (abnormal termination)
redash_celery: ERROR (abnormal termination)

$ sudo supervisorctl restart redash_celery
redash_celery: ERROR (not running)
redash_celery: ERROR (abnormal termination)

$ sudo supervisorctl restart redash_server
redash_server: ERROR (not running)
redash_server: ERROR (abnormal termination)

I am blocked at this stage. I spun up a new instance and tried and the same issue. It does not seem to be related to small memory size or huge query results. If restart does not work, if there is any issue, the issues do not allow me to use the instance anymore.

On the new instance you get these messages right after starting it?

Have you checked the logs? (/opt/redash/logs/api_error.log and /opt/redash/logs/celery_error.log)

I do not get these right after starting it. I get these upon restarting. I have checked those logs. There are lot of error messages like these:

[2017-01-06 20:12:40,607: ERROR/MainProcess] Process ‘Worker-3489’ pid:None exited with ‘exitcode None’
[2017-01-06 20:12:40,618: ERROR/MainProcess] Error on stopping Pool: KeyError(None,)
Traceback (most recent call last):
File “/usr/local/lib/python2.7/dist-packages/celery/bootsteps.py”, line 155, in send_all
fun(parent, *args)
File “/usr/local/lib/python2.7/dist-packages/celery/bootsteps.py”, line 377, in stop
return self.obj.stop()
File “/usr/local/lib/python2.7/dist-packages/celery/concurrency/base.py”, line 123, in stop
self.on_stop()
File “/usr/local/lib/python2.7/dist-packages/celery/concurrency/prefork.py”, line 145, in on_stop
self._pool.join()
File “/usr/local/lib/python2.7/dist-packages/billiard/pool.py”, line 1532, in join
stop_if_not_current(self._result_handler)
File “/usr/local/lib/python2.7/dist-packages/billiard/pool.py”, line 151, in stop_if_not_current
thread.stop(timeout)
File “/usr/local/lib/python2.7/dist-packages/billiard/pool.py”, line 500, in stop
self.on_stop_not_started()
File “/usr/local/lib/python2.7/dist-packages/celery/concurrency/asynpool.py”, line 301, in on_stop_not_started
join_exited_workers(shutdown=True)
File “/usr/local/lib/python2.7/dist-packages/billiard/pool.py”, line 1119, in _join_exited_workers
del self._poolctrl[worker.pid]
KeyError: None
Traceback (most recent call last):
File “/usr/local/bin/celery”, line 11, in
sys.exit(main())
File “/usr/local/lib/python2.7/dist-packages/celery/main.py”, line 30, in main
main()
File “/usr/local/lib/python2.7/dist-packages/celery/bin/celery.py”, line 81, in main
cmd.execute_from_commandline(argv)
File “/usr/local/lib/python2.7/dist-packages/celery/bin/celery.py”, line 769, in execute_from_commandline
super(CeleryCommand, self).execute_from_commandline(argv)))
File “/usr/local/lib/python2.7/dist-packages/celery/bin/base.py”, line 304, in execute_from_commandline
argv = self.setup_app_from_commandline(argv)
File “/usr/local/lib/python2.7/dist-packages/celery/bin/base.py”, line 464, in setup_app_from_commandline
self.app = self.find_app(app)
File “/usr/local/lib/python2.7/dist-packages/celery/bin/base.py”, line 484, in find_app
return find_app(app, symbol_by_name=self.symbol_by_name)
File “/usr/local/lib/python2.7/dist-packages/celery/app/utils.py”, line 222, in find_app
sym = symbol_by_name(app, imp=imp)
File “/usr/local/lib/python2.7/dist-packages/celery/bin/base.py”, line 487, in symbol_by_name

Can you include the full logs files? (all of them in /opt/redash/logs) If you afraid it might include something private, you can email them directly to me (arik at redash.io).

Just for reference I launched new instance (t2.small) with the same AMI - supervisorctl restart all works as expected. I also went and restarted the instance itself and still things work :-\ I’m not sure how your instance got to this state, but maybe the log file will help us understand what went wrong.