KingKong Bruce記事: 修正Calico 3.27升級至新版本後IP Pool無法修改的問題

修正Calico 3.27升級至新版本後IP Pool無法修改的問題

前情提要，我們在 Calico 3.27 升級至 Calico 3.28 之後的版本都會碰到 Defaul IP Pool 跑回 192.168.x.x 而且改不動的情況。

$ calicoctl version
Client Version:    v3.27.4
Git commit:        2183fee02
Cluster Version:   v3.27.4
Cluster Type:      typha,kdd,k8s,operator,bgp,kubeadm
$ calicoctl get ippool -o wide
NAME       CIDR            NAT    IPIPMODE   VXLANMODE     DISABLED   DISABLEBGPEXPORT   SELECTOR
new-pool   10.244.0.0/16   true   Never      CrossSubnet   false      false              all()

重現 Calico 3.28.1 IPPool Issue

我們升級上面的 Kubernetes Cluster 至撰文當下最新的 Calico 3.30。（這裡不要跟著執行哦！）

curl https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/operator-crds.yaml -O
curl https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/tigera-operator.yaml -O

kubectl apply --server-side --force-conflicts -f operator-crds.yaml
kubectl apply --server-side --force-conflicts -f tigera-operator.yaml

kubectl apply -f - <<EOF
apiVersion: operator.tigera.io/v1
kind: Goldmane
metadata:
  name: default
---
apiVersion: operator.tigera.io/v1
kind: Whisker
metadata:
  name: default
EOF

calico-node 更新會慢些，給他一點時間。

$ kubectl get pod -A
NAMESPACE          NAME                                       READY   STATUS    RESTARTS      AGE
calico-apiserver   calico-apiserver-694c587998-fpknn          1/1     Running   0             69s
calico-apiserver   calico-apiserver-694c587998-jfr8z          1/1     Running   0             60s
calico-system      calico-kube-controllers-557c794b8d-9zwr6   1/1     Running   0             65s
calico-system      calico-node-2zpx7                          1/1     Running   1 (19d ago)   19d
calico-system      calico-node-7678c                          1/1     Running   2 (19d ago)   19d
calico-system      calico-node-llqhh                          0/1     Running   0             66s
calico-system      calico-typha-6f49c7766d-jgktb              1/1     Running   0             66s
calico-system      calico-typha-6f49c7766d-rn6hv              1/1     Running   0             66s
calico-system      csi-node-driver-rhgd5                      2/2     Running   0             54s
calico-system      csi-node-driver-tfbmv                      2/2     Running   0             25s
calico-system      csi-node-driver-xjd55                      2/2     Running   0             65s
calico-system      goldmane-5f56496f4c-npgpf                  1/1     Running   0             68s
calico-system      whisker-58796f545-dw7p6                    2/2     Running   0             29s
tigera-operator    tigera-operator-747864d56d-vn8mf           1/1     Running   0             97s
# 省略

calicoctl 也一併更新到 calicoctl 3.30 一下。

$ cd /usr/local/bin/
$ sudo curl -L https://github.com/projectcalico/calico/releases/download/v3.30.2/calicoctl-linux-amd64 -o calicoctl
$ sudo chmod +x ./calicoctl

就能看到前情提要的裡的 IP Pool Issue 出現了。

$ calicoctl get ippool -o wide
NAME                  CIDR             NAT    IPIPMODE   VXLANMODE     DISABLED   DISABLEBGPEXPORT   SELECTOR   ASSIGNMENTMODE
default-ipv4-ippool   192.168.0.0/16   true   Never      CrossSubnet   false      false              all()      Automatic
new-pool              10.244.0.0/16    true   Never      CrossSubnet   false      false              all()      Automatic

升級不行，那刪除重裝呢？

讓快照還原到 Calico 停留在版本 3.27 且 IP Pool 刪除 default-ipv4-ippool（192.168.x.x）後只留 new-pool（10.244.x.x）狀態。

$ calicoctl get ippool -o wide
NAME       CIDR            NAT    IPIPMODE   VXLANMODE     DISABLED   DISABLEBGPEXPORT   SELECTOR
new-pool   10.244.0.0/16   true   Never      CrossSubnet   false      false              all()

反覆思考，那如果我不走直接升級這條路，再走一次 Calico CNI 降級呢？就是把 Calico 3.27 給移除後重新安裝 Calico 3.30 呢？

依當初 Calico 3.27 安裝文件，反向進行刪除動作：

# 先讓所有 node 進入維護模式
$ kubectl drain <nodename> --ignore-daemonsets
# 我們把 3.27 的 IP Pool 設定都清除乾淨 
$ calicoctl delete pool new-pool
$ kubectl delete -f custom-resources.yaml
# 這裡會卡在 `apiserver.operator.tigera.io "default" deleted`，開另一個 Terminal Session 執行一下 uncordon 讓它往下跑
$ kubectl uncordon <nodename>
$ kubectl delete -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.5/manifests/tigera-operator.yaml

可以看到 calico-system namespaces 的 Pod 都被刪除，除了 csi-node-driver Pod。

$ kubectl get pod -A -o wide
NAMESPACE       NAME                                     READY   STATUS              IP              
calico-system   csi-node-driver-6hzz9                    0/2     Terminating         10.244.214.71   
kube-system     coredns-668d6bf9bc-qtz6f                 0/1     ContainerCreating   <none>          
kube-system     coredns-668d6bf9bc-wxm84                 0/1     ContainerCreating   <none>          
kube-system     etcd-twlab-cp01                          1/1     Running             192.168.56.10   
kube-system     kube-apiserver-twlab-cp01                1/1     Running             192.168.56.10   
kube-system     kube-controller-manager-twlab-cp01       1/1     Running             192.168.56.10   
kube-system     kube-proxy-n7dfj                         1/1     Running             192.168.56.10   
kube-system     kube-scheduler-twlab-cp01                1/1     Running             192.168.56.10

這裡需要手動強制把 Control Plane 裡有個 csi-node-driver Pod 用 --force 參數強制刪除。

$ kubectl delete pod -n calico-system csi-node-driver-6hzz9 --force

到這裡算是把 Calico 3.27 完全給移除了。接下來就來安裝 Calico 3.30：

$ kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/operator-crds.yaml
$ kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/tigera-operator.yaml
$ sudo curl https://raw.githubusercontent.com/projectcalico/calico/v3.30.2/manifests/custom-resources.yaml -O
# cidr: 192.168.0.0/16
$ kubectl create -f custom-resources.yaml

kubectl create -f custom-resources.yaml 先不要執行，請把文章看完。

把 Calico 3.27 給刪除掉之後，等於回到一開始還沒導入 CNI 的狀態。記得我們一開始的 Kubernetes Cluster 的 pod network 是設定為 192.168.0.0/16，因此，我們先採用預設值讓 CNI 與 pod network 匹配後能運作起來。

$ kubectl get pod -A -o wide
NAMESPACE          NAME                                       READY   STATUS    RESTARTS   AGE     IP               
calico-apiserver   calico-apiserver-857c6f69f9-2t2nn          1/1     Running   0          2m30s   192.168.214.68   
calico-apiserver   calico-apiserver-857c6f69f9-9rtts          1/1     Running   0          2m30s   192.168.214.71   
calico-system      calico-kube-controllers-6585585f8b-hpmzf   1/1     Running   0          2m27s   192.168.214.69   
calico-system      calico-node-srjw7                          1/1     Running   0          2m27s   192.168.56.10    
calico-system      calico-typha-7cc59465f7-wcsr8              1/1     Running   0          2m28s   192.168.56.10    
calico-system      csi-node-driver-zg8n7                      2/2     Running   0          2m27s   192.168.214.65   
calico-system      goldmane-5f56496f4c-6jdsh                  1/1     Running   0          2m28s   192.168.214.70   
calico-system      whisker-7bdc688f49-c6jhn                   2/2     Running   0          2m23s   192.168.214.72   
kube-system        coredns-668d6bf9bc-57wr8                   1/1     Running   0          9m15s   192.168.214.66   
kube-system        coredns-668d6bf9bc-czbh6                   1/1     Running   0          9m15s   192.168.214.64   
tigera-operator    tigera-operator-747864d56d-qtz6f           1/1     Running   0          3m22s   192.168.56.10

$ calicoctl version
Client Version:    v3.30.2
Git commit:        cf50b5622
Cluster Version:   v3.30.2
Cluster Type:      typha,kdd,k8s,operator,bgp,kubeadm

$ calicoctl get ippool -o wide
NAME                  CIDR             NAT    IPIPMODE   VXLANMODE     DISABLED   DISABLEBGPEXPORT   SELECTOR   ASSIGNMENTMODE
default-ipv4-ippool   192.168.0.0/16   true   Never      CrossSubnet   false      false              all()      Automatic

如果 Calico Pods 建立啟動並且能分配到 IP（192.168.0.0/16），重新安裝 Calico 就沒什麼問題。這樣算是完成了 Calico 3.27 升級到 Calico 3.30 的升級動作。（雖然我們是走移除重新安裝的方法）

Migrate from one IP pool to another

再來一次，我們要切換 IP Pools，參考新版的 Migrate from one IP pool to another 文件。

$ kubectl edit installation default

spec:
  calicoNetwork:
    ipPools:
    - allowedUses:
      - Workload
      - Tunnel
      assignmentMode: Automatic
      blockSize: 26
      cidr: 192.168.0.0/16
      disableBGPExport: false
      disableNewAllocations: false
      encapsulation: VXLANCrossSubnet
      name: default-ipv4-ippool
      natOutgoing: Enabled
      nodeSelector: all()

但部分文件我覺得寫的很不好，有點不知如何下手修改。我們參考一開始安裝的 custom-resources.yaml 與 initial-ippool 的結構會比較知道怎麼進行修改：

$ cat custom-resources.yaml
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  calicoNetwork:
    # 修改 name 與 cidr
    ipPools:
    - name: new-ipv4-ippool
      blockSize: 26
      cidr: 10.244.0.0/16
      encapsulation: VXLANCrossSubnet
      natOutgoing: Enabled
      nodeSelector: all()

多組 ipPools 組態，可以參考 Create multiple IP pools 文件裡的範例。結果在 Create multiple IP pools 文件發現一件事，原來可以在一開始的 custom-resources.yaml 就先設定多組 ipPools，並且一次性灌進去！所以上面我留個註解，各位看到這裡應該明白了吧。

也就是說，一開始的 custom-resources.yaml 就可以設定多組不同的 IP Pools，例如：

apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  calicoNetwork:
    ipPools:
    - name: default-ipv4-ippool
      blockSize: 26
      cidr: 192.168.0.0/16
      encapsulation: VXLANCrossSubnet
      natOutgoing: Enabled
      nodeSelector: all()
    - name: new-ipv4-ippool
      blockSize: 26
      cidr: 10.244.0.0/16
      encapsulation: VXLANCrossSubnet
      natOutgoing: Enabled
      nodeSelector: all()

然後我很想直接覆寫 Installation 組態：

$ kubectl create -f custom-resources.yaml
Error from server (AlreadyExists): error when creating "custom-resources.yaml": installations.operator.tigera.io "default" already exists
$ kubectl apply -f custom-resources.yaml
Warning: resource installations/default is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
installation.operator.tigera.io/default configured

哈，透過 kubectl apply 好像被我更新 Installation 組態成功了。

$ kubectl get ippools
NAME                  CREATED AT
default-ipv4-ippool   2025-08-19T15:35:03Z
new-ipv4-ippool       2025-08-20T07:41:27Z

有 new-ipv4-ippool 了，趕快看一下 installation 組態怎麼寫：

$ kubectl edit installation default
spec:
  calicoNetwork:
    ipPools:
    - allowedUses:
      - Workload
      - Tunnel
      assignmentMode: Automatic
      blockSize: 26
      cidr: 192.168.0.0/16
      disableBGPExport: false
      disableNewAllocations: false
      encapsulation: VXLANCrossSubnet
      name: default-ipv4-ippool
      natOutgoing: Enabled
      nodeSelector: all()
    - allowedUses:
      - Workload
      - Tunnel
      assignmentMode: Automatic
      blockSize: 26
      cidr: 10.244.0.0/16
      disableBGPExport: false
      disableNewAllocations: false
      encapsulation: VXLANCrossSubnet
      name: new-ipv4-ippool
      natOutgoing: Enabled
      nodeSelector: all()

終於看到多組 ipPools 的組態方式，但跟 Migrate from one IP pool to another 文件提供的 Yaml 階層不一樣呀，難怪我看文件怎麼試都套不上去！（內心一堆：靠北邊走的聲音！）

# 文件錯誤階層
- name: new-ipv4-pool
  cidr: 10.0.0.0/16
  encapsulation: IPIP

到這裡完成文件裡的第一步。接著第二步修改 nodeSelector 條件與第三步刪除 Pod 的測試：

$ kubectl edit installation default

# 將 name: default-ipv4-ippool 的 nodeSelector 條件進行修改
- nodeSelector: all()
+ nodeSelector: "!all()"

$ kubectl delete pod -n kube-system coredns-668d6bf9bc-57wr8
$ kubectl get pod -A -o wide
NAMESPACE          NAME                                       READY   STATUS    RESTARTS   AGE   IP               
kube-system        coredns-668d6bf9bc-czbh6                   1/1     Running   0          16h   192.168.214.64   
kube-system        coredns-668d6bf9bc-lmhdf                   1/1     Running   0          9s    10.244.214.64

非常好，刪除新建的 coredns Pod 分配到 IP 10.244.x.x。

重新安裝 - 小結

到此，我們解決了 Calico 3.27 升級後，因組態不同造成 IP Pools 混亂的問題。並且重新設定 10.244.x.x 的 IP Pools 來正常提供給 Kubernetes Cluster 使用。並且學習到，如果一開始的 custom-resources.yaml 就設定好多組 IP Pools 的話會省下許多的工作，但凡事都有第一次，有了經驗，未來就知道怎麼快速進行。

還是想走 Calico 升級條路

前面走了好多路，把 Calico 的 installation 稍微搞懂，那如果稍懂之後走 Calico 升級這條路，能不能成功呢？

把系統還原到前面重現 Calico 3.28.1 IPPool Issue狀態。

$ kubectl get pod -A -o wide
NAMESPACE          NAME                                      READY   STATUS    RESTARTS   AGE    IP               
calico-apiserver   calico-apiserver-55c4dd957b-57wr8         1/1     Running   0          87s    192.168.214.65   
calico-apiserver   calico-apiserver-55c4dd957b-qhplx         1/1     Running   0          61s    192.168.214.69   
calico-system      calico-kube-controllers-9b76ddc4d-qtz6f   1/1     Running   0          71s    192.168.214.67   
calico-system      calico-node-ctsjn                         1/1     Running   0          84s    192.168.56.10     
calico-system      calico-typha-f56bc7888-4xt4w              1/1     Running   0          85s    192.168.56.10     
calico-system      csi-node-driver-9rtts                     2/2     Running   0          68s    192.168.214.68   
calico-system      goldmane-5f56496f4c-czbh6                 1/1     Running   0          86s    192.168.214.64   
calico-system      whisker-74544fd9d6-zzsn5                  2/2     Running   0          37s    192.168.214.70   
kube-system        coredns-668d6bf9bc-8n7tj                  1/1     Running   0          32h    10.244.214.68    
kube-system        coredns-668d6bf9bc-xck5j                  1/1     Running   0          32h    10.244.214.70    
tigera-operator    tigera-operator-747864d56d-td8xp          1/1     Running   0          108s   192.168.56.10

$ calicoctl get ippool -o wide
NAME                  CIDR             NAT    IPIPMODE   VXLANMODE     DISABLED   DISABLEBGPEXPORT   SELECTOR   ASSIGNMENTMODE
default-ipv4-ippool   192.168.0.0/16   true   Never      CrossSubnet   false      false              all()      Automatic
new-pool              10.244.0.0/16    true   Never      CrossSubnet   false      false              all()      Automatic

目前還有 coredns 存在 10.244.x.x 舊 IP。我們把它刪除，讓他全部拿到現在系統預設 192.168.x.x 的 IP。

$ kubectl delete pod -n kube-system coredns-668d6bf9bc-8n7tj  
$ kubectl delete pod -n kube-system coredns-668d6bf9bc-xck5j

$ kubectl get pod -A -o wide
NAMESPACE          NAME                                      READY   STATUS    RESTARTS   AGE     IP               
kube-system        coredns-668d6bf9bc-69j9h                  1/1     Running   0          90s     192.168.214.72   
kube-system        coredns-668d6bf9bc-9cqzm                  1/1     Running   0          7m51s   192.168.214.71

讓我們把舊的 new-pool 給刪除。因為 Calico 3.28 之後的版本已經不採用這種組態了。

$ calicoctl delete pool new-pool

這次我們已經學會了修改 installation，加入以下 new-ipv4-ippool 相關 Yaml 資訊：

$ kubectl edit installation default

    - allowedUses:
      - Workload
      - Tunnel
      assignmentMode: Automatic
      blockSize: 26
      cidr: 10.244.0.0/16
      disableBGPExport: false
      disableNewAllocations: false
      encapsulation: VXLANCrossSubnet
      name: new-ipv4-ippool
      natOutgoing: Enabled
      nodeSelector: all()

這次很順利一次就將 new-ipv4-ippool 新增成功。

$ kubectl get ippools
NAME                  CREATED AT
default-ipv4-ippool   2025-08-20T13:31:03Z
new-ipv4-ippool       2025-08-20T13:52:20Z

進行 nodeSelector 條件調整：

$ kubectl edit installation default

# 將 name: default-ipv4-ippool 的 nodeSelector 條件進行修改
- nodeSelector: all()
+ nodeSelector: "!all()"

這裡進行刪除 coredns 測試時，發現還是分配到 192.168.x.x。本想說，不會吧，還是壞的！想想，反正是壞的，那我把 default-ipv4-ippool 刪除讓你沒得挑。

$ kubectl delete ippools default-ipv4-ippool
ippool.projectcalico.org "default-ipv4-ippool" deleted

$ kubectl delete pod -n kube-system coredns-668d6bf9bc-69j9h
pod "coredns-668d6bf9bc-69j9h" deleted

$ kubectl get pod -A -o wide
NAMESPACE          NAME                                      READY   STATUS    RESTARTS   AGE     IP               
kube-system        coredns-668d6bf9bc-srjw7                  1/1     Running   0          11s     10.244.214.75    
kube-system        coredns-668d6bf9bc-wcsr8                  1/1     Running   0          2m42s   192.168.214.75

Yes，分配到 10.244.x.x 了。小心起見，我們部署一個 Web 應用程式：

$ kubectl apply -f teamteched.yaml
service/teamteched-service created
deployment.apps/teamteched-deployment created
$ kubectl get pod -o wide
NAME                                     READY   STATUS    RESTARTS   AGE   IP              
teamteched-deployment-6987bb4548-k8jbq   1/1     Running   0          51s   10.244.214.77   
teamteched-deployment-6987bb4548-tqtk5   1/1     Running   0          51s   10.244.214.78

進行 Web 服務的存取測試：

$ curl -l http://localhost:30000/teamdoc/
<!DOCTYPE html>
<html lang=en>
<head>
#省略

可以正常透過 Service 存取 Pod 內容，代表整條 Kubernetes Network 是通的沒有問題。不過透過 Calico 升級的方式，這裡會有個小怪的地方。

$ kubectl get ippools
NAME                  CREATED AT
default-ipv4-ippool   2025-08-20T13:58:41Z
new-ipv4-ippool       2025-08-20T13:52:20Z

$ kubectl edit installation default
spec:
  calicoNetwork:
    bgp: Enabled
    hostPorts: Enabled
    ipPools:
    - allowedUses:
      - Workload
      - Tunnel
      assignmentMode: Automatic
      blockSize: 26
      cidr: 192.168.0.0/16
      disableBGPExport: false
      disableNewAllocations: false
      encapsulation: VXLANCrossSubnet
      name: default-ipv4-ippool
      natOutgoing: Enabled
      nodeSelector: '!all()'
    - allowedUses:
      - Workload
      - Tunnel
      assignmentMode: Automatic
      blockSize: 26
      cidr: 10.244.0.0/16
      disableBGPExport: false
      disableNewAllocations: false
      encapsulation: VXLANCrossSubnet
      name: new-ipv4-ippool
      natOutgoing: Enabled
      nodeSelector: all()

我們刪除的 default-ipv4-ippool 又跑回來了，並且注意到它的 nodeSelector 是用單引單來設定 '!all()'。後來再去細看文件說明，文件是刪 installation 組態，而不是透過指令去刪除 ipPool。我們再試著把 installation 組態裡的 default-ipv4-ippool 組態刪除後就正常了，讓我們透過 calicoctl 再驗證一下：

$ calicoctl get pool
NAME              CIDR            SELECTOR
new-ipv4-ippool   10.244.0.0/16   all()

完美。

Calico 升級 - 小結

有了重新安裝 Calico 與重新設定 IP Pool 的經驗，我們終於學會了如何在 Caliso 3.27 升級後重新設定新的 IP Pool。

沒有留言:

張貼留言

感謝您的留言，如果我的文章你喜歡或對你有幫助，按個「讚」或「分享」它，我會很高興的。

訂閱：張貼留言 (Atom)

KingKong Bruce記事

網頁

修正Calico 3.27升級至新版本後IP Pool無法修改的問題

修正Calico 3.27升級至新版本後IP Pool無法修改的問題

重現 Calico 3.28.1 IPPool Issue

升級不行，那刪除重裝呢？

Migrate from one IP pool to another

重新安裝 - 小結

還是想走 Calico 升級條路

Calico 升級 - 小結

沒有留言:

張貼留言

精選文章

KKBruce的2016年軟體(開發)工具大補帖

聯絡表單

總網頁瀏覽量

熱門文章

標籤