按照网上修复发起,以为是systemd的版本题目导致,但查抄发现,现场的cgroup的版本还是用的cgroupfs 。
然后开始又开始翻找日记,同时在现场的测试过程中发现,容器在不利用端口映射的方式的环境下扩容并未引起节点的失联。现场采取的并不是NodePort的方式,而是hostPort方式并改动源代码可以大概随机在宿主机启动映射端口,也就是说每启动一个都会通过docker的方式生产iptables规则。一开始的方向以为大概代码题目,但运行了靠近3年,才出现属实说不外去。
真实缘故原由定位:
在docker 日记中查到,有进程相互锁住,但是没查出具体的什么缘故原由:
essage.txt:May 13 10:27:47 paas-10-239-40-157 dockerd: is currently holding the xtables lock; waiting (47s) for it to exit...\nAnother app is currently holding the xtables lock; waiting (49s) for it to exit...\nAnother app is currently holding the xtables lock; waiting (51s) for it to exit...\nAnother app is currently holding the xtables lock; waiting (53s) for it to exit...\nAnother app is currently holding the xtables lock; waiting (55s) for it to exit...\nAnother app is currently holding the xtables lock; waiting (57s) for it to exit...\n"然后我在远程连线现场服务器时,检察top的时间发现有个进程cpu进程100%
检察服务的父进程发现是kube-proxy,查抄kube-proxy的日记发现大量的进程互锁。