.. _helm_chart: ########################### Helm Chart for NVIDIA FLARE ########################### Sometimes, users would like to deploy NVIDIA FLARE to an existing Kubernetes cluster. Now the provisioning tool includes a new builder ``HelmChartBuilder`` that can generate a reference Helm Chart for users to deploy NVIDIA FLARE to a local microk8s Kubernetes instance. .. note:: The generated Helm Chart is a starting point and serves as a reference. Depending on the Kubernetes cluster, users may need to modify and/or to perform additional operations to successfully deploy the chart. .. note:: The following document assumes users have microk8s (common bundle in ubuntu server 20.04 and above) running on his local machine. With the helm chart, users are able to start the overseer, servers in the k8s cluster after provisioning. The clients and admin console can connect to the overseer and servers in the k8s cluster. *************************** Update on provisioning tool *************************** In order to generate the helm chart, add the HelmChartBuilder to the project.yml file. .. code-block:: yaml - path: nvflare.lighter.impl.helm_chart.HelmChartBuilder args: docker_image: localhost:32000/nvfl-min:0.0.1 The ``docker_image`` is the actual image used for all pods running in the k8s. The provisioners have to build it separately and make sure it is available to the k8s cluster. For microk8s, enabling the docker registry server by running this: .. code-block:: shell microk8s enable registry This will create a registry server listening to port 32000 ******************** Provisioning results ******************** Running provision command as usual, either in the new format ``nvflare provision`` or just ``provision``. After the command, there should a folder with structure similar to the following: .. code-block:: shell $ tree -L 1 . ├── admin@nvidia.com ├── compose.yaml ├── nvflare_compose ├── nvflare_hc ├── overseer ├── server1 ├── server2 ├── site-1 └── site-2 8 directories, 1 file Note there is a nvflare_hc folder. This folder is the Helm Chart package. ****************** Preparing microk8s ****************** Enabling microk8s addons ======================== NVIDIA FLARE Helm Chart depends on a few services (aka addons in microk8s) provided by the Kubernetes cluster. Please check if they are enabled. .. code-block:: shell $ microk8s status microk8s is running high-availability: no datastore master nodes: 127.0.0.1:19001 datastore standby nodes: none addons: enabled: dns # (core) CoreDNS ha-cluster # (core) Configure high availability on the current node helm3 # (core) Helm 3 - Kubernetes package manager hostpath-storage # (core) Storage class; allocates storage from host directory ingress # (core) Ingress controller for external access registry # (core) Private image registry exposed on localhost:32000 storage # (core) Alias to hostpath-storage add-on, deprecated disabled: community # (core) The community addons repository dashboard # (core) The Kubernetes dashboard gpu # (core) Automatic enablement of Nvidia CUDA helm # (core) Helm 2 - the package manager for Kubernetes host-access # (core) Allow Pods connecting to Host services smoothly mayastor # (core) OpenEBS MayaStor metallb # (core) Loadbalancer for your Kubernetes cluster metrics-server # (core) K8s Metrics Server for API access to service metrics prometheus # (core) Prometheus operator for monitoring and logging rbac # (core) Role-Based Access Control for authorisation If any of the enabled services are not enabled in your environment, please enable it. The following example shows how to enable helm3 addon. .. code-block:: shell $ microk8s enable helm3 Infer repository core for addon helm3 Enabling Helm 3 Fetching helm version v3.8.0. % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 12.9M 100 12.9M 0 0 11.5M 0 0:00:01 0:00:01 --:--:-- 11.5M Helm 3 is enabled Allowing network traffic ======================== We have to change the cluster to allow incoming network traffic, such as those from admin consoles and NVIDIA FLARE clients, to enter the cluster. After the network traffic enters the cluster, the cluster also needs to know how to route the traffice to the deployed services. Users have to enable ingress controller and modify some configuration of microk8s cluster. Complete the following steps to enable microk8s to open and route network traffic to overseer and servers. Edit configmap of ingress to route traffic ------------------------------------------ .. code-block:: shell $ microk8s kubectl edit cm nginx-ingress-tcp-microk8s-conf -n ingress Add this section to the configmap .. code-block:: yaml data: "8002": default/server1:8002 "8003": default/server1:8003 "8102": default/server2:8102 "8103": default/server2:8103 "8443": default/overseer:8443 Edit DaemonSet of ingress to open ports --------------------------------------- .. code-block:: shell $ microk8s kubectl edit ds nginx-ingress-microk8s-controller -n ingress Add this section at (spec.template.spec.containers[0].ports) .. code-block:: yaml - containerPort: 8443 hostPort: 8443 name: overseer protocol: TCP - containerPort: 8002 hostPort: 8002 name: server1fl protocol: TCP - containerPort: 8003 hostPort: 8003 name: server1adm protocol: TCP - containerPort: 8102 hostPort: 8102 name: server2fl protocol: TCP - containerPort: 8103 hostPort: 8103 name: server2adm protocol: TCP ********************* Installing helm chart ********************* To install the helm chart, with microk8s environment, run the following command in the same directory as previous section. .. code-block:: shell $ mkdir -p /tmp/nvflare $ microk8s helm3 install --set workspace=$(pwd) --set svc-persist=/tmp/nvflare nvflare-helm-chart-demo nvflare_hc/ NAME: nvflare-helm-chart-demo LAST DEPLOYED: Fri Sep 23 12:28:24 2022 NAMESPACE: default STATUS: deployed REVISION: 1 TEST SUITE: None Here the ``nvflare-helm-chart-demo`` is the name we choose for this installed application. You can choose a different name so that it's easy to recognize the deployed application. The ``nvflare_hc/`` is the folder provisioning tool generated, as shown in the previous section. You can take a look at files in that folder and feel free to change them for your own environment. .. note:: Here we use the host's /tmp/nvflare as the persist storage space for all pods in microk8s. Please make sure that directory exists before running the above command **************************************** Verifying NVIDIA FLARE is up and running **************************************** You can use ``kubectl`` to check the status of NVIDIA FLARE application, installed by the chart. For example, in microk8s environment, run the following command to see if overseer and servers are started. .. code-block:: shell $ microk8s kubectl get pods NAME READY STATUS RESTARTS AGE dnsutils 1/1 Running 74 (13m ago) 62d server1-7675668544-xvfvp 1/1 Running 0 4m50s overseer-6f9dd66c97-n7bkd 1/1 Running 0 4m50s server2-86bc4fc87f-s9n2s 1/1 Running 0 4m50s The ``dnsutils`` is a built-in addon for dns service inside microk8s. You can ignore it. For more details on the pods inside Kubernetes cluster, you can run the following command. .. code-block:: shell $ microk8s kubectl describe pods Name: dnsutils Namespace: default Priority: 0 Node: demolaptop/192.168.1.96 Start Time: Fri, 22 Jul 2022 13:36:54 -0700 Labels: Annotations: cni.projectcalico.org/containerID: 9cfa2cfbb4ef7b11b10c5793965e2a42682dea5d0b05b4454b4232da9ded6a8e cni.projectcalico.org/podIP: 10.1.179.67/32 cni.projectcalico.org/podIPs: 10.1.179.67/32 Status: Running IP: 10.1.179.67 IPs: IP: 10.1.179.67 Containers: dnsutils: Container ID: containerd://3c31a42f9c5dc10452d2af0a503682cd78e25a4b078877f96a1174d1156a23a5 Image: k8s.gcr.io/e2e-test-images/jessie-dnsutils:1.3 Image ID: k8s.gcr.io/e2e-test-images/jessie-dnsutils@sha256:8b03e4185ecd305bc9b410faac15d486a3b1ef1946196d429245cdd3c7b152eb Port: Host Port: Command: sleep 3600 State: Running Started: Fri, 23 Sep 2022 12:19:55 -0700 Last State: Terminated Reason: Unknown Exit Code: 255 Started: Thu, 18 Aug 2022 11:18:34 -0700 Finished: Fri, 23 Sep 2022 12:19:25 -0700 Ready: True Restart Count: 74 Environment: Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-f4sxs (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: kube-api-access-f4sxs: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Name: server1-7675668544-xvfvp Namespace: default Priority: 0 Node: demolaptop/192.168.1.96 Start Time: Fri, 23 Sep 2022 12:28:25 -0700 Labels: pod-template-hash=7675668544 system=server1 Annotations: cni.projectcalico.org/containerID: 7493a356143ad0c4e4fdbe781d995c01d52c4caa31e961066d4a8769dfa1d360 cni.projectcalico.org/podIP: 10.1.179.94/32 cni.projectcalico.org/podIPs: 10.1.179.94/32 Status: Running IP: 10.1.179.94 IPs: IP: 10.1.179.94 Controlled By: ReplicaSet/server1-7675668544 Containers: server1: Container ID: containerd://16928775549dbf9cb2d68eea6412e682a170f72b5dbcdbf8c56790c8b9a30fd5 Image: localhost:32000/nvfl-min:0.0.1 Image ID: localhost:32000/nvfl-min@sha256:71658dc82b15e6cd5a2580c78e56011d166a70e1ff098306c93584c82cb63821 Ports: 8002/TCP, 8003/TCP Host Ports: 0/TCP, 0/TCP Command: /usr/local/bin/python3 Args: -u -m nvflare.private.fed.app.server.server_train -m /workspace/server1 -s fed_server.json --set secure_train=true config_folder=config State: Running Started: Fri, 23 Sep 2022 12:28:27 -0700 Ready: True Restart Count: 0 Environment: Mounts: /tmp/nvflare from svc-persist (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hkhhq (ro) /workspace from workspace (rw) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: workspace: Type: HostPath (bare host directory volume) Path: /home/nvflare/workspace/nvf_hc_test/demo HostPathType: Directory svc-persist: Type: HostPath (bare host directory volume) Path: /tmp/nvflare HostPathType: Directory kube-api-access-hkhhq: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Name: overseer-6f9dd66c97-n7bkd Namespace: default Priority: 0 Node: demolaptop/192.168.1.96 Start Time: Fri, 23 Sep 2022 12:28:25 -0700 Labels: pod-template-hash=6f9dd66c97 system=overseer Annotations: cni.projectcalico.org/containerID: e9f6f2efb548c16217377eaaa8b79534a67e016277c3a0933d202d04904f46dc cni.projectcalico.org/podIP: 10.1.179.80/32 cni.projectcalico.org/podIPs: 10.1.179.80/32 Status: Running IP: 10.1.179.80 IPs: IP: 10.1.179.80 Controlled By: ReplicaSet/overseer-6f9dd66c97 Containers: overseer: Container ID: containerd://82426e5e414b863fff1cc4c8963a3e18acd49ff1ccb51befaf5c984f3ad0f1a4 Image: localhost:32000/nvfl-min:0.0.1 Image ID: localhost:32000/nvfl-min@sha256:71658dc82b15e6cd5a2580c78e56011d166a70e1ff098306c93584c82cb63821 Port: 8443/TCP Host Port: 0/TCP Command: /workspace/overseer/startup/start.sh State: Running Started: Fri, 23 Sep 2022 12:28:27 -0700 Ready: True Restart Count: 0 Environment: Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dz7qz (ro) /workspace from workspace (rw) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: workspace: Type: HostPath (bare host directory volume) Path: /home/nvflare/workspace/nvf_hc_test/demo HostPathType: Directory kube-api-access-dz7qz: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Name: server2-86bc4fc87f-s9n2s Namespace: default Priority: 0 Node: demolaptop/192.168.1.96 Start Time: Fri, 23 Sep 2022 12:28:25 -0700 Labels: pod-template-hash=86bc4fc87f system=server2 Annotations: cni.projectcalico.org/containerID: 8ac76a0bfad2e4f0b1de9115f0d46c1a0dbacabb847c6160b1f144e82720fe99 cni.projectcalico.org/podIP: 10.1.179.96/32 cni.projectcalico.org/podIPs: 10.1.179.96/32 Status: Running IP: 10.1.179.96 IPs: IP: 10.1.179.96 Controlled By: ReplicaSet/server2-86bc4fc87f Containers: server2: Container ID: containerd://c1e530fc6fc320d9b9388d81727440324cc11e0bb61e3b3e76a2362638f89357 Image: localhost:32000/nvfl-min:0.0.1 Image ID: localhost:32000/nvfl-min@sha256:71658dc82b15e6cd5a2580c78e56011d166a70e1ff098306c93584c82cb63821 Ports: 8102/TCP, 8103/TCP Host Ports: 0/TCP, 0/TCP Command: /usr/local/bin/python3 Args: -u -m nvflare.private.fed.app.server.server_train -m /workspace/server2 -s fed_server.json --set secure_train=true config_folder=config State: Running Started: Fri, 23 Sep 2022 12:28:28 -0700 Ready: True Restart Count: 0 Environment: Mounts: /tmp/nvflare from svc-persist (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6cwbh (ro) /workspace from workspace (rw) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: workspace: Type: HostPath (bare host directory volume) Path: /home/nvflare/workspace/nvf_hc_test/demo HostPathType: Directory svc-persist: Type: HostPath (bare host directory volume) Path: /tmp/nvflare HostPathType: Directory kube-api-access-6cwbh: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: ************************ Login with admin console ************************ Now on another terminal, with nvflare installed and /etc/hosts modified to include the IP of overseer, server1 and server2, which is the IP of the machine running the microk8s cluster, run fl_admin.sh of admin@nvidia.com/startup. Login as admin@nvidia.com. For example: /etc/hosts is modified as (if microk8s is running at 192.168.1.123 and clients and admin console is running at slowdesktop machine) .. code-block:: shell $ cat /etc/hosts 127.0.0.1 localhost 127.0.1.1 slowdesktop 192.168.1.123 overseer server1 server2 # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters *********************** Uninstalling helm chart *********************** Users can uninstall the chart by running (note ``nvflare-helm-chart-demo`` is the release name we used when installing the chart) .. code-block:: shell $ microk8s helm3 uninstall nvflare-helm-chart-demo