修正Calico 3.27升級至新版本後IP Pool無法修改的問題
前情提要,我們在 Calico 3.27 升級至 Calico 3.28 之後的版本都會碰到 Defaul IP Pool 跑回 192.168.x.x
而且改不動的情況。
$ calicoctl version
Client Version: v3.27.4
Git commit: 2183fee02
Cluster Version: v3.27.4
Cluster Type: typha,kdd,k8s,operator,bgp,kubeadm
$ calicoctl get ippool -o wide
NAME CIDR NAT IPIPMODE VXLANMODE DISABLED DISABLEBGPEXPORT SELECTOR
new-pool 10.244.0.0/16 true Never CrossSubnet false false all()
重現 Calico 3.28.1 IPPool Issue
我們升級上面的 Kubernetes Cluster 至撰文當下最新的 Calico 3.30。(這裡不要跟著執行哦!)
curl https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/operator-crds.yaml -O
curl https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/tigera-operator.yaml -O
kubectl apply --server-side --force-conflicts -f operator-crds.yaml
kubectl apply --server-side --force-conflicts -f tigera-operator.yaml
kubectl apply -f - <<EOF
apiVersion: operator.tigera.io/v1
kind: Goldmane
metadata:
name: default
---
apiVersion: operator.tigera.io/v1
kind: Whisker
metadata:
name: default
EOF
calico-node
更新會慢些,給他一點時間。
$ kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE
calico-apiserver calico-apiserver-694c587998-fpknn 1/1 Running 0 69s
calico-apiserver calico-apiserver-694c587998-jfr8z 1/1 Running 0 60s
calico-system calico-kube-controllers-557c794b8d-9zwr6 1/1 Running 0 65s
calico-system calico-node-2zpx7 1/1 Running 1 (19d ago) 19d
calico-system calico-node-7678c 1/1 Running 2 (19d ago) 19d
calico-system calico-node-llqhh 0/1 Running 0 66s
calico-system calico-typha-6f49c7766d-jgktb 1/1 Running 0 66s
calico-system calico-typha-6f49c7766d-rn6hv 1/1 Running 0 66s
calico-system csi-node-driver-rhgd5 2/2 Running 0 54s
calico-system csi-node-driver-tfbmv 2/2 Running 0 25s
calico-system csi-node-driver-xjd55 2/2 Running 0 65s
calico-system goldmane-5f56496f4c-npgpf 1/1 Running 0 68s
calico-system whisker-58796f545-dw7p6 2/2 Running 0 29s
tigera-operator tigera-operator-747864d56d-vn8mf 1/1 Running 0 97s
# 省略
calicoctl
也一併更新到 calicoctl 3.30 一下。
$ cd /usr/local/bin/
$ sudo curl -L https://github.com/projectcalico/calico/releases/download/v3.30.2/calicoctl-linux-amd64 -o calicoctl
$ sudo chmod +x ./calicoctl
就能看到前情提要的裡的 IP Pool Issue 出現了。
$ calicoctl get ippool -o wide
NAME CIDR NAT IPIPMODE VXLANMODE DISABLED DISABLEBGPEXPORT SELECTOR ASSIGNMENTMODE
default-ipv4-ippool 192.168.0.0/16 true Never CrossSubnet false false all() Automatic
new-pool 10.244.0.0/16 true Never CrossSubnet false false all() Automatic
升級不行,那刪除重裝呢?
讓快照還原到 Calico 停留在版本 3.27 且 IP Pool 刪除 default-ipv4-ippool
(192.168.x.x)後只留 new-pool
(10.244.x.x)狀態。
$ calicoctl get ippool -o wide
NAME CIDR NAT IPIPMODE VXLANMODE DISABLED DISABLEBGPEXPORT SELECTOR
new-pool 10.244.0.0/16 true Never CrossSubnet false false all()
反覆思考,那如果我不走直接升級這條路,再走一次 Calico CNI 降級呢?就是把 Calico 3.27 給移除後重新安裝 Calico 3.30 呢?
依當初 Calico 3.27 安裝文件,反向進行刪除動作:
# 先讓所有 node 進入維護模式
$ kubectl drain <nodename> --ignore-daemonsets
# 我們把 3.27 的 IP Pool 設定都清除乾淨
$ calicoctl delete pool new-pool
$ kubectl delete -f custom-resources.yaml
# 這裡會卡在 `apiserver.operator.tigera.io "default" deleted`,開另一個 Terminal Session 執行一下 uncordon 讓它往下跑
$ kubectl uncordon <nodename>
$ kubectl delete -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.5/manifests/tigera-operator.yaml
可以看到 calico-system
namespaces 的 Pod 都被刪除,除了 csi-node-driver
Pod。
$ kubectl get pod -A -o wide
NAMESPACE NAME READY STATUS IP
calico-system csi-node-driver-6hzz9 0/2 Terminating 10.244.214.71
kube-system coredns-668d6bf9bc-qtz6f 0/1 ContainerCreating <none>
kube-system coredns-668d6bf9bc-wxm84 0/1 ContainerCreating <none>
kube-system etcd-twlab-cp01 1/1 Running 192.168.56.10
kube-system kube-apiserver-twlab-cp01 1/1 Running 192.168.56.10
kube-system kube-controller-manager-twlab-cp01 1/1 Running 192.168.56.10
kube-system kube-proxy-n7dfj 1/1 Running 192.168.56.10
kube-system kube-scheduler-twlab-cp01 1/1 Running 192.168.56.10
這裡需要手動強制把 Control Plane 裡有個 csi-node-driver
Pod 用 --force
參數強制刪除。
$ kubectl delete pod -n calico-system csi-node-driver-6hzz9 --force
到這裡算是把 Calico 3.27 完全給移除了。接下來就來安裝 Calico 3.30:
$ kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/operator-crds.yaml
$ kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/tigera-operator.yaml
$ sudo curl https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/custom-resources.yaml -O
# cidr: 192.168.0.0/16
$ kubectl create -f custom-resources.yaml
kubectl create -f custom-resources.yaml
先不要執行,請把文章看完。
把 Calico 3.27 給刪除掉之後,等於回到一開始還沒導入 CNI 的狀態。記得我們一開始的 Kubernetes Cluster 的 pod network 是設定為 192.168.0.0/16
,因此,我們先採用預設值讓 CNI 與 pod network 匹配後能運作起來。
$ kubectl get pod -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP
calico-apiserver calico-apiserver-857c6f69f9-2t2nn 1/1 Running 0 2m30s 192.168.214.68
calico-apiserver calico-apiserver-857c6f69f9-9rtts 1/1 Running 0 2m30s 192.168.214.71
calico-system calico-kube-controllers-6585585f8b-hpmzf 1/1 Running 0 2m27s 192.168.214.69
calico-system calico-node-srjw7 1/1 Running 0 2m27s 192.168.56.10
calico-system calico-typha-7cc59465f7-wcsr8 1/1 Running 0 2m28s 192.168.56.10
calico-system csi-node-driver-zg8n7 2/2 Running 0 2m27s 192.168.214.65
calico-system goldmane-5f56496f4c-6jdsh 1/1 Running 0 2m28s 192.168.214.70
calico-system whisker-7bdc688f49-c6jhn 2/2 Running 0 2m23s 192.168.214.72
kube-system coredns-668d6bf9bc-57wr8 1/1 Running 0 9m15s 192.168.214.66
kube-system coredns-668d6bf9bc-czbh6 1/1 Running 0 9m15s 192.168.214.64
tigera-operator tigera-operator-747864d56d-qtz6f 1/1 Running 0 3m22s 192.168.56.10
$ calicoctl version
Client Version: v3.30.2
Git commit: cf50b5622
Cluster Version: v3.30.2
Cluster Type: typha,kdd,k8s,operator,bgp,kubeadm
$ calicoctl get ippool -o wide
NAME CIDR NAT IPIPMODE VXLANMODE DISABLED DISABLEBGPEXPORT SELECTOR ASSIGNMENTMODE
default-ipv4-ippool 192.168.0.0/16 true Never CrossSubnet false false all() Automatic
如果 Calico Pods 建立啟動並且能分配到 IP(192.168.0.0/16),重新安裝 Calico 就沒什麼問題。這樣算是完成了 Calico 3.27 升級到 Calico 3.30 的升級動作。(雖然我們是走移除重新安裝的方法)
Migrate from one IP pool to another
再來一次,我們要切換 IP Pools,參考新版的 Migrate from one IP pool to another 文件。
$ kubectl edit installation default
spec:
calicoNetwork:
ipPools:
- allowedUses:
- Workload
- Tunnel
assignmentMode: Automatic
blockSize: 26
cidr: 192.168.0.0/16
disableBGPExport: false
disableNewAllocations: false
encapsulation: VXLANCrossSubnet
name: default-ipv4-ippool
natOutgoing: Enabled
nodeSelector: all()
但部分文件我覺得寫的很不好,有點不知如何下手修改。我們參考一開始安裝的 custom-resources.yaml 與 initial-ippool 的結構會比較知道怎麼進行修改:
$ cat custom-resources.yaml
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
calicoNetwork:
# 修改 name 與 cidr
ipPools:
- name: new-ipv4-ippool
blockSize: 26
cidr: 10.244.0.0/16
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
nodeSelector: all()
多組 ipPools
組態,可以參考 Create multiple IP pools 文件裡的範例。結果在 Create multiple IP pools 文件發現一件事,原來可以在一開始的 custom-resources.yaml
就先設定多組 ipPools
,並且一次性灌進去!所以上面我留個註解,各位看到這裡應該明白了吧。
也就是說,一開始的 custom-resources.yaml
就可以設定多組不同的 IP Pools,例如:
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
calicoNetwork:
ipPools:
- name: default-ipv4-ippool
blockSize: 26
cidr: 192.168.0.0/16
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
nodeSelector: all()
- name: new-ipv4-ippool
blockSize: 26
cidr: 10.244.0.0/16
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
nodeSelector: all()
然後我很想直接覆寫 Installation
組態:
$ kubectl create -f custom-resources.yaml
Error from server (AlreadyExists): error when creating "custom-resources.yaml": installations.operator.tigera.io "default" already exists
$ kubectl apply -f custom-resources.yaml
Warning: resource installations/default is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
installation.operator.tigera.io/default configured
哈,透過 kubectl apply
好像被我更新 Installation
組態成功了。
$ kubectl get ippools
NAME CREATED AT
default-ipv4-ippool 2025-08-19T15:35:03Z
new-ipv4-ippool 2025-08-20T07:41:27Z
有 new-ipv4-ippool
了,趕快看一下 installation
組態怎麼寫:
$ kubectl edit installation default
spec:
calicoNetwork:
ipPools:
- allowedUses:
- Workload
- Tunnel
assignmentMode: Automatic
blockSize: 26
cidr: 192.168.0.0/16
disableBGPExport: false
disableNewAllocations: false
encapsulation: VXLANCrossSubnet
name: default-ipv4-ippool
natOutgoing: Enabled
nodeSelector: all()
- allowedUses:
- Workload
- Tunnel
assignmentMode: Automatic
blockSize: 26
cidr: 10.244.0.0/16
disableBGPExport: false
disableNewAllocations: false
encapsulation: VXLANCrossSubnet
name: new-ipv4-ippool
natOutgoing: Enabled
nodeSelector: all()
終於看到多組 ipPools
的組態方式,但跟 Migrate from one IP pool to another 文件提供的 Yaml 階層不一樣呀,難怪我看文件怎麼試都套不上去!(內心一堆:靠北邊走的聲音!)
# 文件錯誤階層
- name: new-ipv4-pool
cidr: 10.0.0.0/16
encapsulation: IPIP
到這裡完成文件裡的第一步。接著第二步修改 nodeSelector
條件與第三步刪除 Pod 的測試:
$ kubectl edit installation default
# 將 name: default-ipv4-ippool 的 nodeSelector 條件進行修改
- nodeSelector: all()
+ nodeSelector: "!all()"
$ kubectl delete pod -n kube-system coredns-668d6bf9bc-57wr8
$ kubectl get pod -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP
kube-system coredns-668d6bf9bc-czbh6 1/1 Running 0 16h 192.168.214.64
kube-system coredns-668d6bf9bc-lmhdf 1/1 Running 0 9s 10.244.214.64
非常好,刪除新建的 coredns
Pod 分配到 IP 10.244.x.x。
重新安裝 - 小結
到此,我們解決了 Calico 3.27 升級後,因組態不同造成 IP Pools 混亂的問題。並且重新設定 10.244.x.x 的 IP Pools 來正常提供給 Kubernetes Cluster 使用。並且學習到,如果一開始的 custom-resources.yaml
就設定好多組 IP Pools 的話會省下許多的工作,但凡事都有第一次,有了經驗,未來就知道怎麼快速進行。
還是想走 Calico 升級條路
前面走了好多路,把 Calico 的 installation
稍微搞懂,那如果稍懂之後走 Calico 升級這條路,能不能成功呢?
把系統還原到前面重現 Calico 3.28.1 IPPool Issue狀態。
$ kubectl get pod -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP
calico-apiserver calico-apiserver-55c4dd957b-57wr8 1/1 Running 0 87s 192.168.214.65
calico-apiserver calico-apiserver-55c4dd957b-qhplx 1/1 Running 0 61s 192.168.214.69
calico-system calico-kube-controllers-9b76ddc4d-qtz6f 1/1 Running 0 71s 192.168.214.67
calico-system calico-node-ctsjn 1/1 Running 0 84s 192.168.56.10
calico-system calico-typha-f56bc7888-4xt4w 1/1 Running 0 85s 192.168.56.10
calico-system csi-node-driver-9rtts 2/2 Running 0 68s 192.168.214.68
calico-system goldmane-5f56496f4c-czbh6 1/1 Running 0 86s 192.168.214.64
calico-system whisker-74544fd9d6-zzsn5 2/2 Running 0 37s 192.168.214.70
kube-system coredns-668d6bf9bc-8n7tj 1/1 Running 0 32h 10.244.214.68
kube-system coredns-668d6bf9bc-xck5j 1/1 Running 0 32h 10.244.214.70
tigera-operator tigera-operator-747864d56d-td8xp 1/1 Running 0 108s 192.168.56.10
$ calicoctl get ippool -o wide
NAME CIDR NAT IPIPMODE VXLANMODE DISABLED DISABLEBGPEXPORT SELECTOR ASSIGNMENTMODE
default-ipv4-ippool 192.168.0.0/16 true Never CrossSubnet false false all() Automatic
new-pool 10.244.0.0/16 true Never CrossSubnet false false all() Automatic
目前還有 coredns
存在 10.244.x.x 舊 IP。我們把它刪除,讓他全部拿到現在系統預設 192.168.x.x 的 IP。
$ kubectl delete pod -n kube-system coredns-668d6bf9bc-8n7tj
$ kubectl delete pod -n kube-system coredns-668d6bf9bc-xck5j
$ kubectl get pod -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP
kube-system coredns-668d6bf9bc-69j9h 1/1 Running 0 90s 192.168.214.72
kube-system coredns-668d6bf9bc-9cqzm 1/1 Running 0 7m51s 192.168.214.71
讓我們把舊的 new-pool
給刪除。因為 Calico 3.28 之後的版本已經不採用這種組態了。
$ calicoctl delete pool new-pool
這次我們已經學會了修改 installation
,加入以下 new-ipv4-ippool
相關 Yaml 資訊:
$ kubectl edit installation default
- allowedUses:
- Workload
- Tunnel
assignmentMode: Automatic
blockSize: 26
cidr: 10.244.0.0/16
disableBGPExport: false
disableNewAllocations: false
encapsulation: VXLANCrossSubnet
name: new-ipv4-ippool
natOutgoing: Enabled
nodeSelector: all()
這次很順利一次就將 new-ipv4-ippool
新增成功。
$ kubectl get ippools
NAME CREATED AT
default-ipv4-ippool 2025-08-20T13:31:03Z
new-ipv4-ippool 2025-08-20T13:52:20Z
進行 nodeSelector
條件調整:
$ kubectl edit installation default
# 將 name: default-ipv4-ippool 的 nodeSelector 條件進行修改
- nodeSelector: all()
+ nodeSelector: "!all()"
這裡進行刪除 coredns
測試時,發現還是分配到 192.168.x.x。本想說,不會吧,還是壞的!想想,反正是壞的,那我把 default-ipv4-ippool
刪除讓你沒得挑。
$ kubectl delete ippools default-ipv4-ippool
ippool.projectcalico.org "default-ipv4-ippool" deleted
$ kubectl delete pod -n kube-system coredns-668d6bf9bc-69j9h
pod "coredns-668d6bf9bc-69j9h" deleted
$ kubectl get pod -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP
kube-system coredns-668d6bf9bc-srjw7 1/1 Running 0 11s 10.244.214.75
kube-system coredns-668d6bf9bc-wcsr8 1/1 Running 0 2m42s 192.168.214.75
Yes,分配到 10.244.x.x
了。小心起見,我們部署一個 Web 應用程式:
$ kubectl apply -f teamteched.yaml
service/teamteched-service created
deployment.apps/teamteched-deployment created
$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP
teamteched-deployment-6987bb4548-k8jbq 1/1 Running 0 51s 10.244.214.77
teamteched-deployment-6987bb4548-tqtk5 1/1 Running 0 51s 10.244.214.78
進行 Web 服務的存取測試:
$ curl -l http://localhost:30000/teamdoc/
<!DOCTYPE html>
<html lang=en>
<head>
#省略
可以正常透過 Service 存取 Pod 內容,代表整條 Kubernetes Network 是通的沒有問題。不過透過 Calico 升級的方式,這裡會有個小怪的地方。
$ kubectl get ippools
NAME CREATED AT
default-ipv4-ippool 2025-08-20T13:58:41Z
new-ipv4-ippool 2025-08-20T13:52:20Z
$ kubectl edit installation default
spec:
calicoNetwork:
bgp: Enabled
hostPorts: Enabled
ipPools:
- allowedUses:
- Workload
- Tunnel
assignmentMode: Automatic
blockSize: 26
cidr: 192.168.0.0/16
disableBGPExport: false
disableNewAllocations: false
encapsulation: VXLANCrossSubnet
name: default-ipv4-ippool
natOutgoing: Enabled
nodeSelector: '!all()'
- allowedUses:
- Workload
- Tunnel
assignmentMode: Automatic
blockSize: 26
cidr: 10.244.0.0/16
disableBGPExport: false
disableNewAllocations: false
encapsulation: VXLANCrossSubnet
name: new-ipv4-ippool
natOutgoing: Enabled
nodeSelector: all()
我們刪除的 default-ipv4-ippool
又跑回來了,並且注意到它的 nodeSelector
是用單引單來設定 '!all()'
。後來再去細看文件說明,文件是刪 installation
組態,而不是透過指令去刪除 ipPool
。我們再試著把 installation
組態裡的 default-ipv4-ippool
組態刪除後就正常了,讓我們透過 calicoctl
再驗證一下:
$ calicoctl get pool
NAME CIDR SELECTOR
new-ipv4-ippool 10.244.0.0/16 all()
完美。
Calico 升級 - 小結
有了重新安裝 Calico 與重新設定 IP Pool 的經驗,我們終於學會了如何在 Caliso 3.27 升級後重新設定新的 IP Pool。
沒有留言:
張貼留言
感謝您的留言,如果我的文章你喜歡或對你有幫助,按個「讚」或「分享」它,我會很高興的。