2024年1月24日 星期三

Kubernetes/k8s 開發筆記 - 在 Ubuntu 16.04 安裝 Kubeadm 以及處理 docker、containerd 版本過舊問題

之前已經用 docker 來封裝一些非常重的工作任務,像是 build fw 等。現在來試試看 kubeadm 這個工具,將維護整套系統的維度從 docker 轉進到 Kubernetes cluster,往後可以靠 k8s 來維護算力資源,像是動態調配算力單元等等。這些感覺滿像十多年前在 AWS 靠 autoscaling 做的事,真是熟悉的陌生人。

這篇僅處理在 Ubuntu 16.04 安裝 Kubeadm 後的啟動問題,並沒有處理其他使用細節,包括建立 node server 、 連上即加入 master server 等。

環境簡介:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.6 LTS
Release:        16.04
Codename:       xenial

$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
$ echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
$ sudo apt update
$ sudo apt install kubeadm
$ sudo apt-mark hold kubelet kubeadm kubectl

$ dpkg -l  | grep kube
ii  kubeadm                 1.28.2-00          amd64        Kubernetes Cluster Bootstrapping Tool
ii  kubectl                 1.28.2-00          amd64        Kubernetes Command Line Tool
ii  kubelet                 1.28.2-00          amd64        Kubernetes Node Agent
ii  kubernetes-cni          1.2.0-00           amd64        Kubernetes CNI

接著:

$ sudo kubeadm init --v=5
...
validating the existence and emptiness of directory /var/lib/etcd
[preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"

發現有些問題,進行排除研究,部分資訊推導應當跟 docker , containerd 版本有高度相關,就先把 docker 跟 containerd 盡可能升級上去:

$ dpkg -l | grep containerd
ii  containerd              1.2.6-0ubuntu1~16.04.6+esm1  amd64        daemon to control runC
$ dpkg -l | grep docker
rc  docker                                     1.5-1                                           amd64        System tray for KDE3/GNOME2 docklet applications
ii  docker.io                                  18.09.7-0ubuntu1~16.04.7                        amd64        Linux container runtime
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
$ echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
$ sudo apt update
$ sudo apt install docker-ce docker-ce-cli containerd.io

$ sudo docker version
Client: Docker Engine - Community
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        f0df350
 Built:             Wed Jun  2 11:56:47 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:54:58 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.6
  GitCommit:        d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc:
  Version:          1.0.0-rc95
  GitCommit:        b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

接著追蹤可能是 cri 插件的部分,試著排除:

$ cat /etc/containerd/config.toml | grep cri
enabled_plugins = ["cri"]

無效,繼續努力:

$ sudo mv /etc/containerd/config.toml /etc/containerd/config.toml.bak
$ containerd config default | sudo tee /etc/containerd/config.toml
$ sudo systemctl restart containerd
$ containerd config default | grep containerd.sock
  address = "/run/containerd/containerd.sock"

接著在試著  kubeadm init 還是有一樣的問題,查詢了細節滿有可能是 containerd 版本還是太舊了,有個關鍵資訊是說 1.6 版本以前會缺乏溝通介面

$ dpkg -L containerd.io | grep bin
/usr/bin
/usr/bin/containerd-shim-runc-v2
/usr/bin/containerd-shim
/usr/bin/containerd
/usr/bin/runc
/usr/bin/ctr
/usr/bin/containerd-shim-runc-v1

直接到 containerd.io 官網下載最新版 1.7.11 版的 binary 方案:

$ wget https://github.com/containerd/containerd/releases/download/v1.7.11/containerd-1.7.11-linux-amd64.tar.gz
$ tar xvf containerd-1.7.11-linux-amd64.tar.gzl
$ tar -tzvf containerd-1.7.11-linux-amd64.tar.gz
drwxr-xr-x root/root         0 2023-12-09 07:41 bin/
-rwxr-xr-x root/root  12185600 2023-12-09 07:41 bin/containerd-shim-runc-v2
-rwxr-xr-x root/root  28330360 2023-12-09 07:41 bin/ctr
-rwxr-xr-x root/root   7061504 2023-12-09 07:41 bin/containerd-shim
-rwxr-xr-x root/root   8761344 2023-12-09 07:41 bin/containerd-shim-runc-v1
-rwxr-xr-x root/root  26184312 2023-12-09 07:41 bin/containerd-stress
-rwxr-xr-x root/root  55551616 2023-12-09 07:41 bin/containerd

處理一下系統內部的:

$ sudo systemctl stop containerd
$ sudo mkdir -p /usr/bin/containerd-1.4.6
$ sudo mv /usr/bin/containerd* /usr/bin/containerd-1.4.6/
$ sudo mv /usr/bin/ctr /usr/bin/containerd-1.4.6/
$ tree /usr/bin/containerd-1.4.6/
/usr/bin/containerd-1.4.6/
├── containerd
├── containerd-shim
├── containerd-shim-runc-v1
├── containerd-shim-runc-v2
└── ctr

0 directories, 5 files

$ sudo cp ~/bin/c* /usr/bin/

準備重新啟動:

$ containerd --version
containerd github.com/containerd/containerd v1.7.11 64b8a811b07ba6288238eefc14d898ee0b5b99ba
$ containerd config default | sudo tee /etc/containerd/config.toml
$ sudo systemctl stop containerd
$ sudo systemctl start containerd
$ sudo systemctl status containerd
● containerd.service - containerd container runtime
   Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
   Active: active (running); 14min ago
     Docs: https://containerd.io
  Process: 19396 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
 Main PID: 19406 (containerd)
    Tasks: 32
   Memory: 24.5M
      CPU: 187ms
   CGroup: /system.slice/containerd.service
           └─19406 /usr/bin/containerd
$ sudo systemctl stop docker
$ sudo systemctl start docker
$ sudo docker version
Client: Docker Engine - Community
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        f0df350
 Built:             Wed Jun  2 11:56:47 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:54:58 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.7.11
  GitCommit:        64b8a811b07ba6288238eefc14d898ee0b5b99ba
 runc:
  Version:          1.0.0-rc95
  GitCommit:        b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

終於讓 docker version 也認到 containerd v1.7.11 了,接著就可以回到 kubeadm 啦 

$ sudo kubeadm init  --v=5
....

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join ip:6443 --token ###### --discovery-token-ca-cert-hash sha256:###### 

此外,kubernetes 本身是建議關閉 swap 的使用來確保整體性能,由於我是在一台本身就有 swap 的機器上運行,由於不能關閉 swap ,只好設法去略過 swap 的檢查 (增加 --fail-swap-on=false ):

$ cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf | grep ExecStart
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS --fail-swap-on=false

相關資訊:

沒有留言:

張貼留言