tl;dr minimum EC2 instance size for V10 is t2.medium. Any ideas why?
Background
One of our V10 release follow-ups is to build new cloud images for V10. We build these images by running our setup script through Packer. Until this work is finished, you can use V10 in EC2 still, just deploy a V8 image and upgrade it.
Per our documentation:
Launch the instance with the pre-baked AMI we create (for small deployments t2.small should be enough):
The Problem
A t2.small instance doesn’t have enough RAM for V10. We didn’t expect this. And it’s quite annoying. Because the V8 image will deploy on t2.small and will run normally. But once you upgrade to V10 and restart the containers, the instance becomes non-responsive.
You need to change the instance type to t2.medium before service is restored.
V8 Resource Consumption
Here’s the output from sudo docker stats
on a V8 instance that wasn’t upgraded. Notice total available RAM is just under 2gb. About 75% of this is in use.
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
074069f3c875 redash_nginx_1 0.00% 2.492MiB / 1.945GiB 0.13% 8.01MB / 3.52MB 34.8MB / 0B 2
ba89d824f682 redash_adhoc_worker_1 0.05% 291.7MiB / 1.945GiB 14.65% 126MB / 59.3MB 101MB / 0B 3
6aba3ac4a872 redash_scheduler_1 0.17% 274.8MiB / 1.945GiB 13.80% 138MB / 81.8MB 645MB / 12.1MB 3
39d98fe4cf2a redash_server_1 0.01% 670.3MiB / 1.945GiB 33.65% 1.16MB / 8.62MB 302MB / 0B 5
b2a9ccd1285b redash_scheduled_worker_1 0.13% 190.2MiB / 1.945GiB 9.55% 126MB / 68MB 90.3MB / 0B 2
1e40cfb160a2 redash_postgres_1 0.01% 13.39MiB / 1.945GiB 0.67% 6.09MB / 5.45MB 382MB / 134MB 12
bc73e56ac9a2 redash_redis_1 0.18% 2.477MiB / 1.945GiB 0.12% 204MB / 385MB 84.6MB / 71.4MB 4
V10 Resource Consumption
Here’s the output from sudo docker stats
on a V8 instance upgraded to V10. Total available RAM is just under 4gb (because this is a t2.medium instance) and total RAM consumption is around 2100Mb.
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
d013031b8481 redash_nginx_1 0.00% 3.188MiB / 3.842GiB 0.08% 8.7MB / 4.42MB 0B / 0B 2
e44e049a70cf redash_scheduler_1 0.00% 195.1MiB / 3.842GiB 4.96% 10.5MB / 18.7MB 0B / 0B 2
e0e0d1829567 redash_adhoc_worker_1 0.03% 589MiB / 3.842GiB 14.97% 4.49MB / 6.42MB 627kB / 0B 7
61bcba583f8e redash_worker_1 0.03% 590.3MiB / 3.842GiB 15.01% 35.3MB / 54.4MB 77.8kB / 0B 7
7eb1a4502a89 redash_server_1 0.01% 772.1MiB / 3.842GiB 19.63% 1.61MB / 9.19MB 1.15MB / 0B 9
ff160d1d6d48 redash_postgres_1 0.00% 11.86MiB / 3.842GiB 0.30% 11.1MB / 11.5MB 762kB / 42.8MB 10
149b61ecedcd redash_redis_1 0.15% 2.473MiB / 3.842GiB 0.06% 69.6MB / 39.9MB 0B / 2.54MB 4
Analysis
The V10 instance containers are using than the maximum RAM available on the t2.small instance (2gb). This means the system starts swapping, which is slow. This is the cause of degraded performance on the upgraded system.
What isn’t clear is why the containers use more memory in V10 than V8. The only containers that are comparable are the redis
, postgres
, and nginx
containers (which come from images we don’t maintain).
From what I can tell, there’s no reason V10 should require this much RAM for a basic deployment. But I’d like to figure this out before we build the new images. In case there is an easy tweak we can make to the setup script that avoids it.