Unix & Linux Asked by Viscosity on December 19, 2021
This has been an irritating enough problem now that I thought I would finally ask the community at large what a possible solution might be. It’s even more irritating that I seem to be the only one experiencing this issue.
Essentially, anytime in CentOS 7.x, sshd configs, or any part of sshd gets modified, and the daemon gets restarted/reloaded at some “random point” in the next 3 minutes, the ssh connections all reset, and then that server is unreachable for a few seconds time via ssh.
This is especially a problem for ansible in that it needs to do these changes itself to sshd sometimes, and also reloading it (for instance in new CentOS 7x server builds). But then in future plays it just randomly can’t connect to ssh, and the it blows up the rest of the playbook/plays for that host which failed to be contacted. This is especially bad for a large host pattern, as a few will randomly complete, but the others will fail at various stages along the playbook after sshd is manipulated. It is of note, that nothing of the sort occurs in CentOS 5x, 6x, or even on Solaris.
The best I can do to avoid this is to create a 90 second wait after any changes to sshd, and even this isn’t totally foolproof. It makes those playbooks take 20+ minutes to run though if it’s invoked 7-8 times.
Here are some facts on this environment:
All new installs are from official ISO DVD’s.
Every server is a hyper-v 2012 guest
Every server which has this problem is CentOS 7.x
Here is some actual output of the problems and some hackneyed solutions:
The failure:
fatal: [voltron]: UNREACHABLE! => {"changed": false, "msg": "All items completed", "results": [{"_ansible_item_result": true, "item": ["rsync", "iotop", "bind-utils", "sysstat.x86_64", "lsof"], "msg": "Failed to connect to the host via ssh: Shared connection to voltron closed.rn", "unreachable": true}]}
Example of one of the changes to sshd:
- name: Configure sshd to disallow root logins for security purposes on CentOS and Redhat 7x servers.
lineinfile:
backup: yes
dest: /etc/ssh/sshd_config
regexp: '^(#PermitRootLogin)'
line: "PermitRootLogin no"
state: present
when: (ansible_distribution == "CentOS" or "RedHat") and (ansible_distribution_major_version == "7")
notify: sshd reload Linux 7x
The following handler:
- name: sshd reload Linux 7x
systemd:
state: restarted
daemon_reload: yes
name: sshd
Finally my ghetto fix to try and account for this problem:
- name: Wait a bit on CentOS/Redhat 7x servers to ensure changes don't mess up ssh and screw up further plays.
pause:
seconds: 90
when: (ansible_distribution == "CentOS" or "RedHat") and (ansible_distribution_major_version == "7")
There has got to be a better solution than what I came up with, and it’s hard to believe that everyone else encounters this and also puts up with it. Is there something I need to configure in CentOS 7.x servers to prevent this? Is there something in ansible that is needed to deal with this, such as multiple ssh attempts per play on first failure?
Thanks in advance!
This seems to be a common Problem. Patch for Ansible ssh retries from 2016
A better solution might be to wait for sshd to be ready to connect. Original thread with this ansible code solution:
[VM creation tasks...]
- name: Wait for the Kickstart install to complete and the VM to reboot local_action: wait_for host={{ vm_hostname }} port=22 delay=30 timeout=1200 state=started
- name: Now configure the VM...
Answered by Nils on December 19, 2021
Rather than using the systemd
module, try the service
module:
- name: Restart secure shell daemon post configuration
service:
name: sshd
state: restarted
Answered by DopeGhoti on December 19, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP