はじめに

これは Kubernetes Advent Calendar 14日目の記事です。

みなさん、おうちKubernetesしてますか？僕はしています。

おうちKubernetesの中でもいろいろ種類がありますが、今回はNVIDIAのGPU(家庭用なのでQuadroなどではなくGeForce)をクラスターに参加させるTipsを一つご紹介します。

NVIDIA Dockerについて

NVIDIA DockerはNVIDIA社から提供されているコンテナ上でGPUを使うためのランタイムです。様々な歴史的経緯から、現在はNVIDIA Container Toolkitと呼ばれています。

現状、DockerでGPUを利用する場合はこちらを使うのが推奨されていて、これまで機械学習などの兼ね合いでNVIDIA GPUをKubernetesで使ってきたみなさんも、裏側ではこれが動いているのがほとんどだったのではないかと思います。

f:id:inductor:20201213035345p:plain

先週、Docker非推奨の例の記事を書いたのですが、Dockerからの移行先をいろいろなユースケースで調査していたときにすぱぶらさんからnvidia-container-runtimeの存在を教えてもらいました。

NVIDIA の GPU Operator を使っている人はまだあまりいないと思うので、これの対応を待つというよりは「GPU ノードでは containerd のデフォルトランタイムに nvidia-container-runtime を使うようにする」がよさそうです。 https://t.co/BvtCZ7Ji0y
— すぱぶら (Kazuki Suda) (@superbrothers) December 3, 2020

こちらをcontainerdと組み合わせて使えばKubernetes上でDockerがなくてもGPUが使えるとのことだったので、動かしてみました。

nvidia-container-runtimeについて

runcへの命令を途中でフックしてGPUを制御できるようにしたもののようです。なので、原理的にはkubelet -> containerd(cri-o) -> nvidia-container-runtime(->runc)という通信の形でコンテナの作成・削除が行われます。

前提条件と入れるもの

基本的にはNVIDIA Container Toolkitが必要とする依存関係とDocker以外はほぼ変わらず、

GPUドライバー
nvidia-container-runtime
containerd(設定済み)

この辺が必要です。

GPUドライバーの導入

docs.nvidia.com

上記を参考にUbuntu上でドライバーを入れていきます。

sudo apt-get install linux-headers-$(uname -r)
# Ensure packages on the CUDA network repository have priority over the Canonical repository.
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-$distribution.pin
sudo mv cuda-$distribution.pin /etc/apt/preferences.d/cuda-repository-pin-600
# Install the CUDA repository public GPG key. Note that on Ubuntu 16.04, replace https with http in the command below.
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/7fa2af80.pub
# Setup the CUDA network repository.
echo "deb http://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list
# Update the APT repository cache and install the driver using the cuda-drivers meta-package. Use the --no-install-recommends option for a lean driver install without any dependencies on X packages. This is particularly useful for headless installations on cloud instances.
sudo apt-get update
sudo apt-get -y install cuda-drivers

Ubuntu Serverで入れたらめっちゃ依存パッケージ入れられた・・・本番でやるときは依存パッケージの選定はしたほうがよさそう、、、

この時点で、nvidia-smiから情報の参照等ができるようになっていると思います。

# nvidia-smi
Sat Dec 12 19:32:29 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01    Driver Version: 455.45.01    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1660    On   | 00000000:01:00.0 Off |                  N/A |
| 35%   31C    P8    15W / 120W |     13MiB /  5944MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       955      G   /usr/lib/xorg/Xorg                  8MiB |
|    0   N/A  N/A      1032      G   /usr/bin/gnome-shell                2MiB |
+-----------------------------------------------------------------------------+

nvidia-container-runtimeの導入

curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update

これだけでインストールはおわり。次にcontainerdから呼び出せるように設定を入れていきます。

containerdの導入と設定

Kubernetes公式ドキュメントに沿いつつ設定を仕込んでいきます。

cat <<EOF | sudo tee /etc/modules-load.d/containerd.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# Setup required sysctl params, these persist across reboots.
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables  = 1
net.ipv4.ip_forward                 = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF

# Apply sysctl params without reboot
sudo sysctl --system

# (Install containerd)
## Set up the repository
### Install packages to allow apt to use a repository over HTTPS
sudo apt-get update && sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common

## Add Docker's official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key --keyring /etc/apt/trusted.gpg.d/docker.gpg add -

## Add Docker apt repository.
sudo add-apt-repository \
    "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
    $(lsb_release -cs) \
    stable"

## Install containerd
sudo apt-get update && sudo apt-get install -y containerd.io

# Configure containerd
sudo mkdir -p /etc/containerd
sudo containerd config default | sudo tee /etc/containerd/config.toml

ここまでいくと、/etc/containerd/config.tomlにcontainerdの設定ファイルが仕込まれます。これを変えてnvidiaランタイムが呼べるようにしていきましょう。

    [plugins."io.containerd.grpc.v1.cri".containerd]
      snapshotter = "overlayfs"
-     default_runtime_name = "runc"
+     default_runtime_name = "nvidia"
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
        # This section is added by system, we can just ignore it.
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          runtime_type = "io.containerd.runc.v2"
          runtime_engine = ""
          runtime_root = ""
          privileged_without_host_devices = false
          base_runtime_spec = ""
+       [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
+         runtime_type = "io.containerd.runc.v2"
+         runtime_engine = ""
+         runtime_root = ""
+         privileged_without_host_devices = false
+         base_runtime_spec = ""
+         [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
+           BinaryName = "nvidia-container-runtime"

注意点

自分がハマったのですが、現状SystemdCgroup = trueをcontainerdに入れるとPodが起動せずエラーになります。来月リリース予定のnvidia-container-runtime 1.4.0で修正されるっぽいので、待ちですね。

github.com

その間はkubeadmでもsystemdをcgroups driverには使えない感じですがまあしょうがない。

ここまで来たらcontainerdを再起動して、終わったらKubernetesを立ち上げます。

sudo systemctl restart containerd

NVIDIA/k8s-device-pluginの導入

github.com

Kubernetes上でGPUを扱えるようにするためのコンポーネント、NVIDIA Device Pluginを導入します。Device PluginについてはKubernetes Meetup Tokyoでも登壇されている方がいらっしゃるので気になる方は資料を見てみると良いかもしれません。

注意点

現状、v0.7.2がエラーになります。masterのものは信用せず1つ前のバージョンを使ってください。

kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.7.1/nvidia-device-plugin.yml

適用されるとこんな感じでノードのセットアップがうまくいきます。

# kubectl get pod -n kube-system
NAME                                      READY   STATUS    RESTARTS   AGE
cilium-hjzwk                              1/1     Running   0          2m2s
cilium-operator-794d86bbb4-qs7jt          0/1     Pending   0          2m2s
cilium-operator-794d86bbb4-w7j26          1/1     Running   0          2m2s
coredns-74ff55c5b-2gjvl                   1/1     Running   0          2m39s
coredns-74ff55c5b-tn4jz                   1/1     Running   0          2m39s
etcd-inductor-ubuntu                      1/1     Running   0          2m32s
kube-apiserver-inductor-ubuntu            1/1     Running   0          2m32s
kube-controller-manager-inductor-ubuntu   1/1     Running   0          2m32s
kube-scheduler-inductor-ubuntu            1/1     Running   0          2m32s
nvidia-device-plugin-daemonset-xggrc      1/1     Running   0          26s
# kubectl logs -n kube-system nvidia-device-plugin-daemonset-xggrc
2020/12/12 18:46:31 Loading NVML
2020/12/12 18:46:31 Starting FS watcher.
2020/12/12 18:46:31 Starting OS watcher.
2020/12/12 18:46:31 Retreiving plugins.
2020/12/12 18:46:31 Starting GRPC server for 'nvidia.com/gpu'
2020/12/12 18:46:31 Starting to serve 'nvidia.com/gpu' on /var/lib/kubelet/device-plugins/nvidia-gpu.sock
2020/12/12 18:46:31 Registered device plugin for 'nvidia.com/gpu' with Kubelet

ここまで来るとGPUを使うPodがスケジュールできるようになるはずです。

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
    - name: cuda-container
      image: nvidia/cuda:9.0-devel
      resources:
        limits:
          nvidia.com/gpu: 1

上記のYAMLを該当のノードにスケジュールしてみました。中身はなんもないんでコケますが、少なくともスケジューリングとコンテナの起動まではできているので、成功です。

Events:
  Type     Reason     Age              From               Message
  ----     ------     ----             ----               -------
  Normal   Scheduled  53s              default-scheduler  Successfully assigned default/gpu-pod to inductor-ubuntu
  Normal   Pulling    52s              kubelet            Pulling image "nvidia/cuda:9.0-devel"
  Normal   Pulled     8s               kubelet            Successfully pulled image "nvidia/cuda:9.0-devel" in 44.252889118s
  Normal   Created    6s (x2 over 7s)  kubelet            Created container cuda-container
  Normal   Started    6s (x2 over 7s)  kubelet            Started container cuda-container
  Normal   Pulled     6s               kubelet            Container image "nvidia/cuda:9.0-devel" already present on machine
  Warning  BackOff    4s (x2 over 5s)  kubelet            Back-off restarting failed container

ノードの情報を取得すると、ランタイムにcontainerdが動いていることがわかります。

# kubectl get node -o wide
NAME              STATUS   ROLES                  AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
gpu-node   Ready    control-plane,master   36m   v1.20.0   192.168.50.12   <none>        Ubuntu 20.04.1 LTS   5.4.0-58-generic   containerd://1.4.3

というわけで私からは「Docker無しでGPUを使うPodをスケジュールする方法」についてシェアさせていただきました。DockerがKubernetesから消えてしまうと寂しい感じもしますが、メリットもあるので余裕があるかたはぜひお試しください！

ありがとうございました～。

inductor's blog

nothing but self note :)

NVIDIA Docker(NVIDIA Container Toolkit)からnvidia-container-runtime + containerdに移行するために知っておくべきこと