如何將地端Kubernets叢集加入Azure Arc,以 Kubeadm 建立的Kubernetes叢集為例

如何將地端Kubernets叢集加入Azure Arc,以 Kubeadm 建立的Kubernetes叢集為例

前面我們利用 Kind 及 MicroK8s 來架設 Kubernetes 叢集,以驗證地端 Kubernetes 叢集與 Azure Arc 的整合,過程也還算順利。接下來,我在同一台 Surface Pro 主機上採用 Hyper-V 並利用Kubeadm 架設的 Kubernetes 叢集來進行再次驗證。當然,有這一篇的出現,代表過程有問題的出現。

$ kubectl get nodes -o wide
NAME     STATUS     ROLES           AGE    VERSION    INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                                    KERNEL-VERSION      CONTAINER-RUNTIME
lnode    Ready      control-plane   111d   v1.26.5    192.168.8.150    <none>        Ubuntu 22.04.2 LTS                          5.15.0-88-generic   containerd://1.6.21
lnode2   Ready      <none>          2d5h   v1.26.11   192.168.8.153    <none>        Ubuntu 22.04.3 LTS                          5.15.0-88-generic   containerd://1.6.24

原生 Kubernetes 的設定步驟均和 MicroK8s 一樣。不過進行到 az connectedk8s connect 步驟時,我得到不一樣的結果:

$ az connectedk8s connect --name "kubernetes" --resource-group "AzureArcTest" --location "eastus" --tags "Project=azure_arc_k8s"
This operation might take a while...

The outbound network connectivity check has failed for the endpoint - https://eastus.obo.arc.azure.com:8084/
This will affect the "cluster-connect" feature. If you are planning to use "cluster-connect" functionality , please ensure outbound connectivity to the above endpoint.

Error: We found an issue with outbound network connectivity from the cluster to the endpoints required for onboarding.
Please ensure to meet the following network requirements 'https://docs.microsoft.com/en-us/azure/azure-arc/kubernetes/quickstart-connect-cluster?tabs=azure-cli#meet-network-requirements'
If your cluster is behind an outbound proxy server, please ensure that you have passed proxy parameters during the onboarding of your cluster.
For more details visit 'https://docs.microsoft.com/en-us/azure/azure-arc/kubernetes/quickstart-connect-cluster?tabs=azure-cli#connect-using-an-outbound-proxy-server'

The pre-check result logs logs have been saved at this path:/home/kkbruce/.azure/pre_onboarding_check_logs/Arc-Kind-Demo-Sun-Nov-19-13.56.47-2023 .
These logs can be attached while filing a support ticket for further assistance.

One or more pre-onboarding diagnostic checks failed and hence not proceeding with cluster onboarding. Please resolve them and try onboarding again.

我得到的一個非常奇怪的錯誤訊息,我連線不到某個 Azure Arc 的服務端點(https://eastus.obo.arc.azure.com:8084/)!

我 Ubuntu 在安裝 Kubernetes 叢集時的 Firewall 預設是關閉的。不可能是 Firewall 造成。另外,我 Azure CLI 是怎麼裝的?az login 怎麼完成登入的?一堆的問號瞬間在我心裡閃過。

再做個最簡單的 curl 測試:

lnode:~$ curl ifconfig.me
180.217.16.163

lnode2:~$ curl ifconfig.me
180.217.16.163

下個 az 指令看看:

# 回應一行空白
# 前一篇做完 PoC 就刪除資源了,目前資源內容為空
az connectedk8s list --resource-group AzureArcTest --output table

依錯誤提示說明裡的文件進行測試在測試,也是沒以找到核心問題在那裡。實在完全沒有頭緒,原本想開 MS Support,但不知道這 Case 要怎麼開,Azure Arc 是雲端,但我們要加入又是 Linux 與 Kubernetes 叢集又是在地端。還好小道消息夠多,MS Support 最近支援 Azure Arc 導入問題,就順利的開 Case 給 MS Support。

一開始查網路,再來查連線:

# https://learn.microsoft.com/en-us/azure/azure-arc/kubernetes/network-requirements?tabs=azure-cloud
nc -vz management.azure.com 443
nc -vz eastus.dp.kubernetesconfiguration.azure.com 443
nc -vz login.microsoftonline.com 443
nc -vz eastus.login.microsoft.com 443
nc -vz login.windows.net 443
nc -vz mcr.microsoft.com 443
nc -vz gbl.his.arc.azure.com 443
nc -vz k8connecthelm.azureedge.net 443
nc -vz guestnotificationservice.azure.com 443
nc -vz sts.windows.net 443
nc -vz k8sconnectcsp.azureedge.net 443
nc -vz eastus.obo.arc.azure.com 8084
nc -vz eastus.obo.arc.azure.com 443
nc -vz dl.k8s.io 443
Connection to management.azure.com (52.231.22.6) 443 port [tcp/https] succeeded!
Connection to eastus.dp.kubernetesconfiguration.azure.com (20.49.109.33) 443 port [tcp/https] succeeded!
Connection to login.microsoftonline.com (40.126.38.22) 443 port [tcp/https] succeeded!
Connection to eastus.login.microsoft.com (20.190.151.2) 443 port [tcp/https] succeeded!
Connection to login.windows.net (20.190.166.132) 443 port [tcp/https] succeeded!
Connection to mcr.microsoft.com (204.79.197.219) 443 port [tcp/https] succeeded!
Connection to gbl.his.arc.azure.com (20.6.141.126) 443 port [tcp/https] succeeded!
Connection to k8connecthelm.azureedge.net (13.107.246.73) 443 port [tcp/https] succeeded!
Connection to guestnotificationservice.azure.com (23.98.104.13) 443 port [tcp/https] succeeded!
Connection to sts.windows.net (40.126.38.20) 443 port [tcp/https] succeeded!
Connection to k8sconnectcsp.azureedge.net (13.107.213.73) 443 port [tcp/https] succeeded!
Connection to eastus.obo.arc.azure.com (52.146.79.132) 8084 port [tcp/*] succeeded!
Connection to eastus.obo.arc.azure.com (52.146.79.132) 443 port [tcp/https] succeeded!
Connection to dl.k8s.io (34.107.204.206) 443 port [tcp/https] succeeded!

# https://guestnotificationservice.azure.com/urls/allowlist?api-version=2020-01-01&location=eastus
nc -vz azgnrelay-eastus-l1.servicebus.windows.net 443
nc -vz g0-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g1-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g2-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g3-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g4-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g5-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g6-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g7-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g8-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g9-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g10-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g11-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g12-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g13-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g14-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g15-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g16-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g17-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g18-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g19-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g20-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g21-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g22-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz g23-prod-dm2-012-sb.servicebus.windows.net 443
nc -vz azgn-eastus-public-1s-weuam3-010.servicebus.windows.net 443
nc -vz g0-prod-am3-010-sb.servicebus.windows.net 443
nc -vz g1-prod-am3-010-sb.servicebus.windows.net 443
nc -vz g2-prod-am3-010-sb.servicebus.windows.net 443
nc -vz g3-prod-am3-010-sb.servicebus.windows.net 443
nc -vz g4-prod-am3-010-sb.servicebus.windows.net 443
nc -vz g5-prod-am3-010-sb.servicebus.windows.net 443
nc -vz g6-prod-am3-010-sb.servicebus.windows.net 443
nc -vz g7-prod-am3-010-sb.servicebus.windows.net 443
nc -vz g8-prod-am3-010-sb.servicebus.windows.net 443
nc -vz g9-prod-am3-010-sb.servicebus.windows.net 443
nc -vz g10-prod-am3-010-sb.servicebus.windows.net 443
nc -vz g11-prod-am3-010-sb.servicebus.windows.net 443
nc -vz g12-prod-am3-010-sb.servicebus.windows.net 443
nc -vz g13-prod-am3-010-sb.servicebus.windows.net 443
nc -vz g14-prod-am3-010-sb.servicebus.windows.net 443
nc -vz g15-prod-am3-010-sb.servicebus.windows.net 443
Connection to azgnrelay-eastus-l1.servicebus.windows.net (40.122.115.96) 443 port [tcp/https] succeeded!
Connection to g0-prod-dm2-012-sb.servicebus.windows.net (40.78.128.187) 443 port [tcp/https] succeeded!
Connection to g1-prod-dm2-012-sb.servicebus.windows.net (40.122.125.202) 443 port [tcp/https] succeeded!
Connection to g2-prod-dm2-012-sb.servicebus.windows.net (40.78.146.148) 443 port [tcp/https] succeeded!
Connection to g3-prod-dm2-012-sb.servicebus.windows.net (40.78.147.229) 443 port [tcp/https] succeeded!
Connection to g4-prod-dm2-012-sb.servicebus.windows.net (40.78.145.132) 443 port [tcp/https] succeeded!
Connection to g5-prod-dm2-012-sb.servicebus.windows.net (40.78.148.5) 443 port [tcp/https] succeeded!
Connection to g6-prod-dm2-012-sb.servicebus.windows.net (40.78.149.97) 443 port [tcp/https] succeeded!
Connection to g7-prod-dm2-012-sb.servicebus.windows.net (40.78.150.16) 443 port [tcp/https] succeeded!
Connection to g8-prod-dm2-012-sb.servicebus.windows.net (40.122.201.83) 443 port [tcp/https] succeeded!
Connection to g9-prod-dm2-012-sb.servicebus.windows.net (40.122.200.123) 443 port [tcp/https] succeeded!
Connection to g10-prod-dm2-012-sb.servicebus.windows.net (40.122.206.140) 443 port [tcp/https] succeeded!
Connection to g11-prod-dm2-012-sb.servicebus.windows.net (40.77.101.129) 443 port [tcp/https] succeeded!
Connection to g12-prod-dm2-012-sb.servicebus.windows.net (40.77.101.153) 443 port [tcp/https] succeeded!
Connection to g13-prod-dm2-012-sb.servicebus.windows.net (40.77.103.44) 443 port [tcp/https] succeeded!
Connection to g14-prod-dm2-012-sb.servicebus.windows.net (40.77.20.82) 443 port [tcp/https] succeeded!
Connection to g15-prod-dm2-012-sb.servicebus.windows.net (40.77.20.95) 443 port [tcp/https] succeeded!
Connection to g16-prod-dm2-012-sb.servicebus.windows.net (13.89.96.42) 443 port [tcp/https] succeeded!
Connection to g17-prod-dm2-012-sb.servicebus.windows.net (13.89.96.60) 443 port [tcp/https] succeeded!
Connection to g18-prod-dm2-012-sb.servicebus.windows.net (13.89.96.156) 443 port [tcp/https] succeeded!
Connection to g19-prod-dm2-012-sb.servicebus.windows.net (13.89.96.254) 443 port [tcp/https] succeeded!
Connection to g20-prod-dm2-012-sb.servicebus.windows.net (40.78.146.84) 443 port [tcp/https] succeeded!
Connection to g21-prod-dm2-012-sb.servicebus.windows.net (40.78.147.51) 443 port [tcp/https] succeeded!
Connection to g22-prod-dm2-012-sb.servicebus.windows.net (40.78.147.74) 443 port [tcp/https] succeeded!
Connection to g23-prod-dm2-012-sb.servicebus.windows.net (40.78.147.142) 443 port [tcp/https] succeeded!
Connection to azgn-eastus-public-1s-weuam3-010.servicebus.windows.net (168.63.24.14) 443 port [tcp/https] succeeded!
Connection to g0-prod-am3-010-sb.servicebus.windows.net (23.97.239.39) 443 port [tcp/https] succeeded!
Connection to g1-prod-am3-010-sb.servicebus.windows.net (23.97.236.88) 443 port [tcp/https] succeeded!
Connection to g2-prod-am3-010-sb.servicebus.windows.net (23.97.240.158) 443 port [tcp/https] succeeded!
Connection to g3-prod-am3-010-sb.servicebus.windows.net (23.97.240.181) 443 port [tcp/https] succeeded!
Connection to g4-prod-am3-010-sb.servicebus.windows.net (23.97.240.223) 443 port [tcp/https] succeeded!
Connection to g5-prod-am3-010-sb.servicebus.windows.net (23.97.240.230) 443 port [tcp/https] succeeded!
Connection to g6-prod-am3-010-sb.servicebus.windows.net (23.97.240.242) 443 port [tcp/https] succeeded!
Connection to g7-prod-am3-010-sb.servicebus.windows.net (23.97.240.255) 443 port [tcp/https] succeeded!
Connection to g8-prod-am3-010-sb.servicebus.windows.net (23.97.240.10) 443 port [tcp/https] succeeded!
Connection to g9-prod-am3-010-sb.servicebus.windows.net (23.97.247.112) 443 port [tcp/https] succeeded!
Connection to g10-prod-am3-010-sb.servicebus.windows.net (23.97.247.200) 443 port [tcp/https] succeeded!
Connection to g11-prod-am3-010-sb.servicebus.windows.net (23.97.240.36) 443 port [tcp/https] succeeded!
Connection to g12-prod-am3-010-sb.servicebus.windows.net (23.97.240.62) 443 port [tcp/https] succeeded!
Connection to g13-prod-am3-010-sb.servicebus.windows.net (23.97.241.25) 443 port [tcp/https] succeeded!
Connection to g14-prod-am3-010-sb.servicebus.windows.net (23.97.246.22) 443 port [tcp/https] succeeded!
Connection to g15-prod-am3-010-sb.servicebus.windows.net (23.97.243.77) 443 port [tcp/https] succeeded!

這裡學到一個 Linux nc 指令。其中更有「nc -vz eastus.obo.arc.azure.com 8084」錯誤訊息裡的端點,可以看到從 VM 對外連線一切正常。

目前的情報:

VM (Hyper-V) <--> Internet

從前面的 PoC 得知,Azure Arc 會部屬相關的 Pods,是否可能 Pod 的連網能力被受限了呢?

Pods <--?--> K8s <--> VM(Hyper-V) <--> Internet

Kubernetes 官網有提供一篇 Debugging DNS Resolution 可以進行 Pods 裡的 DNS 問題排除。結果還真的有不正常訊息。

lnode:~$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns
[ERROR] plugin/errors: 2 7451145207094757858.7952584762411664970. HINFO: read udp 192.168.241.193:45072->192.168.8.1:53: i/o timeout
[INFO] 127.0.0.1:56746 - 1673 "HINFO IN 7451145207094757858.7952584762411664970. udp 57 false 512" - - 0 2.000556367s
[ERROR] plugin/errors: 2 7451145207094757858.7952584762411664970. HINFO: read udp 192.168.241.193:38480->192.168.8.1:53: i/o timeout
[INFO] 192.168.175.88:57347 - 49494 "A IN www.google.com.tw.default.svc.cluster.local. udp 61 false 512" NXDOMAIN qr,aa,rd 154 0.000172442s
[INFO] 192.168.175.88:56334 - 30416 "A IN www.google.com.tw.svc.cluster.local. udp 53 false 512" NXDOMAIN qr,aa,rd 146 0.000092021s
[INFO] 192.168.175.88:43556 - 13496 "A IN www.google.com.tw. udp 35 false 512" - - 0 2.000505495s
[ERROR] plugin/errors: 2 www.google.com.tw. A: read udp 192.168.241.193:53476->192.168.8.1:53: i/o timeout
[INFO] 192.168.175.88:47077 - 41056 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd 106 0.000082302s
[INFO] 192.168.175.88:49554 - 15749 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd 155 0.000151019s
[INFO] 192.168.175.88:59269 - 56849 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd 106 0.000084726s
[INFO] 127.0.0.1:53771 - 27691 "HINFO IN 6739484294044022710.9221370444002970242. udp 57 false 512" - - 0 2.001135748s
[ERROR] plugin/errors: 2 6739484294044022710.9221370444002970242. HINFO: read udp 192.168.241.202:37663->192.168.8.1:53: i/o timeout
[INFO] 127.0.0.1:53127 - 43654 "HINFO IN 6739484294044022710.9221370444002970242. udp 57 false 512" - - 0 2.000475744s
[ERROR] plugin/errors: 2 6739484294044022710.9221370444002970242. HINFO: read udp 192.168.241.202:53341->192.168.8.1:53: i/o timeout
[INFO] 127.0.0.1:48478 - 38163 "HINFO IN 6739484294044022710.9221370444002970242. udp 57 false 512" - - 0 2.000714319s
[ERROR] plugin/errors: 2 6739484294044022710.9221370444002970242. HINFO: read udp 192.168.241.202:46963->192.168.8.1:53: i/o timeout
[INFO] 127.0.0.1:41160 - 62351 "HINFO IN 6739484294044022710.9221370444002970242. udp 57 false 512" - - 0 2.000483071s
[ERROR] plugin/errors: 2 6739484294044022710.9221370444002970242. HINFO: read udp 192.168.241.202:56141->192.168.8.1:53: i/o timeout
[INFO] 192.168.175.88:35490 - 63508 "A IN www.google.com.tw.cluster.local. udp 49 false 512" NXDOMAIN qr,aa,rd 142 0.000103233s
[INFO] 192.168.175.88:54181 - 23691 "A IN kubernetes.default.default.svc.cluster.local. udp 62 false 512" NXDOMAIN qr,aa,rd 155 0.000166224s


lnode:~$ kubectl exec -i -t dnsutils -- nslookup www.google.com.tw
Server:         10.96.0.10
Address:        10.96.0.10#53

** server can't find www.google.com.tw: SERVFAIL

因為 MS Supoort 也不是 Kubernetes 的專家,後續與 MS Support 也停在這裡。我則一直在查詢與處理 Pods 內部為何查不到外部 Domain 這件事。


故事到這裡必須先回到 05 MicroK8s 文章身上一下。

還記得前一篇介紹 MicroK8s 加入 Azure Arc,那篇 MicroK8s 文件其實比這篇晚寫很久,正常發文順序的話應該是先 06 Kubuadm 才是 05 MicroK8s。在 Kubeadm 架設的 Kubernetes 卡住許久而且查不出個其他線索的情況下,突發奇想,我來架個對照組,如果是不同 Kubernetes 發行平台(Kubeadm 與 MicroK8s),在同一台電腦,同一網路環境,同樣作業系統,在基本條件一樣的情況下,在加入 Azure Arc 是否也會產生一樣的問題。結果 MicroK8s 很順利就加入了 Azure Arc 清單之中,因此 05 MicroK8s 的文件就先產出來。

在排查過程中,也有跟 MS Supoort 討論發行平台這件事。就我們自架的 Ubuntu Server + Kubeadm 架設 Kubernetes 叢集雖然不在文件清單之上,但我相信,Kubeadm 設定出來的叢集,絕對符合 CNCF 規範,跟 Azure Arc 的工程師確認後回覆也得到這個組合是沒有問題的。

這裡讓我思考了幾件事也排除幾件事:

Pods <-- DNS查詢正常 --> K8s(MicroK8s) <--> VM(Hyper-V) <--> Internet
Pods <-- DNS查詢不正常 --> K8s(Kubeadm - Kubernetes) <--> VM(Hyper-V) <--> Internet

MicroK8s 叢集做了什麼事,造成 Pod 內部查詢 DNS 是正常的?

mk8s:~$ kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
pod/dnsutils created

mk8s:~$ kubectl get pods
NAME       READY   STATUS    RESTARTS   AGE
dnsutils   1/1     Running   0          44m

mk8s:~$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default
Server:         10.152.183.10
Address:        10.152.183.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.152.183.1

mk8s:~$ kubectl exec -ti dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.152.183.10
options ndots:5

mk8s:~$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME                       READY   STATUS    RESTARTS      AGE
coredns-864597b5fd-tvh7q   1/1     Running   2 (80m ago)   16h

mk8s:~$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration SHA512 = deb3871c00828b25727978460b261c74de0519acfb0c61c7813cc2cea8e445ddeb98d0e8f8c7bf945861f2a8b73a581c56067b8fe22b9acd076af72a94958de2
CoreDNS-1.10.1
linux/amd64, go1.20, 055b2c3

mk8s:~$ kubectl get svc --namespace=kube-system
NAME       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.152.183.10   <none>        53/UDP,53/TCP,9153/TCP   16h

mk8s:~$ kubectl get endpoints kube-dns --namespace=kube-system
NAME       ENDPOINTS                                           AGE
kube-dns   10.1.215.200:53,10.1.215.200:53,10.1.215.200:9153   16h

mk8s:~$ kubectl -n kube-system edit configmap coredns
Edit cancelled, no changes made.

mk8s:~$ kubectl describe clusterrole system:coredns -n kube-system
Error from server (NotFound): clusterroles.rbac.authorization.k8s.io "system:coredns" not found

mk8s:~$ kubectl exec -i -t dnsutils -- nslookup www.google.com.tw
Server:         10.152.183.10
Address:        10.152.183.10#53

Non-authoritative answer:
Name:   www.google.com.tw
Address: 142.251.42.227

DNS!DNS!DNS!

回頭細看原生的 Kubeadm 的 Kubernetes 叢集中有許多 192.168.8.1:53: i/o timeout 的資訊。為何 MicroK8s 的 Kubernetes 叢集沒有,而且一查就通。

反覆比對下,原來在 MicroK8s 的 VM 我多做了一件事。你可能覺得是設了 /etc/resolv.conf,錯!結果是 /etc/netplan/00-installer-config.yaml

$ netplan get all
network:
  version: 2
  ethernets:
    eth0:
      addresses:
      - "192.168.8.182/24"
      nameservers:
        addresses:
        - 192.168.8.1
        - 8.8.8.8
        - 1.1.1.1
      routes:
      - to: "default"
        via: "192.168.8.1"

也可以用 resolvectl status 指令來看:

$ resolvectl status
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (eth0)
    Current Scopes: DNS
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 192.168.8.1
       DNS Servers: 192.168.8.1 8.8.8.8 1.1.1.1

因為 06 Kubeadm 的 Kubernetes 卡關的關係,雖然有查到 Pods 查詢 DNS 失敗的關係,但一直找不到解法。因此在灌 MicroK8s 的時候,沒什麼多想就直覺得多設兩組外部常用的 nameserver,雖然很 Pods 內部日誌看不出來,但我相信就是這兩組 DNS Server 協助了 Pods 查詢到正確的 DNS Record。

$ kubectl exec -i -t dnsutils -- nslookup www.google.com.tw
Server:         10.96.0.10
Address:        10.96.0.10#53

Non-authoritative answer:
Name:   www.google.com.tw
Address: 172.217.163.35

$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns
[INFO] 192.168.175.121:59085 - 51543 "A IN www.google.com.tw.default.svc.cluster.local. udp 61 false 512" NXDOMAIN qr,aa,rd 154 0.000130526s
[INFO] 192.168.175.121:49223 - 38942 "A IN www.google.com.tw.svc.cluster.local. udp 53 false 512" NXDOMAIN qr,aa,rd 146 0.000085043s
[INFO] 192.168.175.121:33974 - 36450 "A IN www.google.com.tw.cluster.local. udp 49 false 512" NXDOMAIN qr,aa,rd 142 0.000055078s
[INFO] 192.168.175.121:35588 - 25873 "A IN www.google.com.tw. udp 35 false 512" NOERROR qr,rd,ra 68 0.025888673s

再回想一開始拿到的訊息 192.168.8.1:53: i/o timeout,它的日誌很明確說到,無法使用你目前網路環境所提供的 DNS 來進行查詢。現在因為我額外提供外部 DNS 可供查詢,它也會向 8.8.8.81.1.1.1 發送 DNS 查詢請求,而在網路正常的情況下,外部 DNS 會回應所需的 IP Address。由這個角度來想,這一切就合情合理了。

後來在使用 Windows Pod 也碰到類似的情況。在 Windows Pod 內無法查詢到外部如 www.google.com 的 IP 位置。去 Ethernet &rrar; TCP/IPv4 → DNS 位置,手動加兩組 DNS 記錄上去就行了。

有時候解題也是需要一點運氣,像終於追到 Pods 查詢 DNS 有問題,就一直往 Kubernetes 的 Pods 與 DNS 裡面鑽,忘了要回頭往外面的世界看看。另一件事,在 Kubernetes 裡偵錯時,每一條訊息都必須細細的查看與思考,為何它的輸出這樣的訊息,它想表達什麼。常常漏看一小段訊息,就解不出來,也因為看到或看懂那一小段訊息,問題就解決了。

沒有留言:

張貼留言

感謝您的留言,如果我的文章你喜歡或對你有幫助,按個「讚」或「分享」它,我會很高興的。