啟動的docker容器服務每10-15分鐘被中斷停止

啟動的docker容器服務每10-15分鐘被中斷停止

在某個時間點之後,我們的 Docker for Windows 服務測試區進入了非常不穩定期。一開始不容易發現的原因是,預設執行中的容器(Container)有下 --restart=always 而且又是在測試區,也就是說,當你部署新的容器服務時,會有10-15鐘很正常,然後中間會有個有個瞬斷。容器被中止後,因為 --restart=always 參數的關係, Docker 服務會用最快的時間重啟一個容器還給你用。所以總會覺得怎麼最近寫的容器應用程式怎麼過一段時間就卡卡的。在某次下 docker ps 時看到才發現到這些容器的啟動時間怪怪的。

由系統管理日誌查詢到一些怪怪的日誌:

Index              : 1587
EntryType          : Error
InstanceId         : 1000
Message            : Faulting application name: vmcompute.exe, version: 10.0.19041.423, time stamp: 0x48d9560b
                     Faulting module name: ntdll.dll, version: 10.0.19041.423, time stamp: 0x06701e03
                     Exception code: 0xc0000409
                     Fault offset: 0x00000000000a10c0
                     Faulting process id: 0xd50
                     Faulting application start time: 0x01d67c1965d71782
                     Faulting application path: C:\Windows\system32\vmcompute.exe
                     Faulting module path: C:\Windows\SYSTEM32\ntdll.dll
                     Report Id: 10a05fe9-0ee6-4479-a3cb-64737a981b8f
                     Faulting package full name:
                     Faulting package-relative application ID:
Category           : Application Crashing Events
CategoryNumber     : 100
ReplacementStrings : {vmcompute.exe, 10.0.19041.423, 48d9560b, ntdll.dll...}
Source             : Application Error
TimeGenerated      : 8/27/2020 10:50:16 AM
TimeWritten        : 8/27/2020 10:50:16 AM
UserName           :

---

vmcompute.exe
10.0.19041.423
48d9560b
ntdll.dll
10.0.19041.423
06701e03
c0000409
00000000000a10c0
d50
01d67c1965d71782
C:\Windows\system32\vmcompute.exe
C:\Windows\SYSTEM32\ntdll.dll
10a05fe9-0ee6-4479-a3cb-64737a981b8f

---

Index              : 1588
EntryType          : Information
InstanceId         : 1001
Message            : Fault bucket 1547995663916101540, type 5
                     Event Name: BEX64
                     Response: Not available
                     Cab Id: 0

                     Problem signature:
                     P1: vmcompute.exe
                     P2: 10.0.19041.423
                     P3: 48d9560b
                     P4: ntdll.dll
                     P5: 10.0.19041.423
                     P6: 06701e03
                     P7: 00000000000a10c0
                     P8: c0000409
                     P9: 000000000000000a
                     P10:

                     Attached files:
                     \\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERDAE2.tmp.dmp
                     \\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERDB12.tmp.WERInternalMetadata.xml
                     \\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERDB22.tmp.xml
                     \\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERDB30.tmp.csv
                     \\?\C:\ProgramData\Microsoft\Windows\WER\Temp\WERDB40.tmp.txt

                     These files may be available here:
                     \\?\C:\ProgramData\Microsoft\Windows\WER\ReportArchive\AppCrash_vmcompute.exe_9c15ce55f33b471359c45c3ec826a7815a9831_10679ad2_c54e53a8-72ec-4c49-a56e-89a3b91435c5

                     Analysis symbol:
                     Rechecking for solution: 0
                     Report Id: 10a05fe9-0ee6-4479-a3cb-64737a981b8f
                     Report Status: 268435456
                     Hashed bucket: 4e882a2ea471a109b57b95dacd906fa4
                     Cab Guid: 0
Category           : (0)
CategoryNumber     : 0
ReplacementStrings : {1547995663916101540, 5, BEX64, Not available...}
Source             : Windows Error Reporting
TimeGenerated      : 8/27/2020 10:50:17 AM
TimeWritten        : 8/27/2020 10:50:17 AM
UserName           :

---

Index              : 1589
EntryType          : Error
InstanceId         : 11
Message            : failed wait [spanID=d5ddc0b2c2a28cae traceID=42c67d7dfb568ffef1837f1f081b946d error=hcsshim::Process::waitBackground
                     f152b8a4cafe60afbb05a45a71f3c1461b3266223375bcc16dc49892ca1b7e2b:2824: lost communication with compute service]
Category           : (0)
CategoryNumber     : 0
ReplacementStrings : {failed wait, spanID=d5ddc0b2c2a28cae traceID=42c67d7dfb568ffef1837f1f081b946d error=hcsshim::Process::waitBackground
                     f152b8a4cafe60afbb05a45a71f3c1461b3266223375bcc16dc49892ca1b7e2b:2824: lost communication with compute service}
Source             : docker
TimeGenerated      : 8/27/2020 10:50:17 AM
TimeWritten        : 8/27/2020 10:50:17 AM
UserName           :

---

Index              : 1590
EntryType          : Warning
InstanceId         : 11
Message            : Wait() failed (container may have been killed) [process=init error=process 2824 in container f152b8a4cafe60afbb05a45a71f3c1461b3266223375bcc16dc49892ca1b7e2b
                     encountered an error during hcsshim::Process::waitBackground: lost communication with compute service namespace=moby module=libcontainerd
                     container=f152b8a4cafe60afbb05a45a71f3c1461b3266223375bcc16dc49892ca1b7e2b]
Category           : (0)
CategoryNumber     : 0
ReplacementStrings : {Wait() failed (container may have been killed), process=init error=process 2824 in container f152b8a4cafe60afbb05a45a71f3c1461b3266223375bcc16dc49892ca1b7e2b
                     encountered an error during hcsshim::Process::waitBackground: lost communication with compute service namespace=moby module=libcontainerd
                     container=f152b8a4cafe60afbb05a45a71f3c1461b3266223375bcc16dc49892ca1b7e2b}
Source             : docker
TimeGenerated      : 8/27/2020 10:50:17 AM
TimeWritten        : 8/27/2020 10:50:17 AM
UserName           :

vmcompute.exe 可以看成 Hyper-V 的運作實體,怪了,怎麼會是 Hyper-V 去造成容器的服務被中斷。追了好幾天,實在沒什麼頭緒,原本好好的東西怎麼會突然就壞掉。本來已經想開微軟的技術支援。但突然靈光一閃,這幾台 Docker Host 最近只做了一件事:升級 VMWare Tools。難道會是 VMWare Tools,關鍵字加個 VMWare Tools 一查,頭都快昏:

Losing connection to Docker daemon after a short period of time (10-15 minutes). Issue related to VMWare Tools update 這個 Issue #5044 中提到的 VM Tools v11.0.1 就是我們安裝的版本。而且情境完全符合,趕快去 VMWare 去下載 11.0.6 之後的版本。(撰文當下最新為 VMware Tools 11.1.5)。

這其實是第二次碰到了。只是貴人多忘事,完全忘了前一次的經驗。不留下個筆記,實在對不起自己。

話說,之前還有一次是被 Windows Update 搞到。Windows 自己人搞自己人就算了,現在連 VMWare 都來搞 Windows Container,真的是搞死我了。

沒有留言:

張貼留言

感謝您的留言,如果我的文章你喜歡或對你有幫助,按個「讚」或「分享」它,我會很高興的。