Server Fault Asked by saabeilin on January 6, 2021
We are running Docker swarm mode in production, having 4 nodes, 3 of them share the manager status. We’ve noticed that once in a few days all the containers on one of the hosts are got restarted.
I’ve looked into syslog at that time and could see the following messages (repeating for all the containers that are scheduled on this host):
Jan 26 07:26:14 HOST0 dockerd[13104]: time="2019-01-26T07:26:14.954777646Z" level=warning msg="failed to deactivate service binding for container service_container.1.lhc0gejxgb8y340bg9o2wfcm2" error="No such container: service_container.1.lhc0gejxgb8y340bg9o2wfcm2" module=node/agent node.id=2g08blfds9z26ja2ou06pv2zl
There are some swarm membership-related messages preceding these but they also happen whithout issues and anyway are at level=info.
It’s important to mention that on certain hosts we have single-instance stateful services like databases.
So far I need to understand what triggers the recreation of the services and how to avoid it. Is there anything special I shall grep the logs for, to start with?
Thanks a lot in advance!
I suggest starting with the tasks of the service that restarted. It should give a reason for exiting, and the exit code of the container's PID 1 process.
docker service ps $SERVICE_NAME
will list the last 5 service tasks (or whatever your history limit is set to).
Take the ID of the exited task and use docker inspect $TASK_ID
to get the details.
Answered by King Chung Huang on January 6, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP