Confidential Azure Container Instances Deployment Guide

Overview

This guide will walk you through the complete deployment of confidential Azure Container Instances (ACI). This type of deployment is used to enable NVFlare servers to perform secure aggregation by utilizing features in confidential ACI, such as disabling login, verifying container images before launching instances, and other confidential computing features.

Note

Launching confidential ACI requires your Azure account to have certain permissions. Please consult your Azure account and Azure for more information.

Steps for Launching The NVFlare Server in Confidential ACI

  • Login and create one resource group

  • Create one Azure container registry

  • Build container images

  • Publish container images to Azure container registry

  • Create and run the confidential ACI launch script

Login and create one resource group

First, you have to login to Azure with az cli. Then you can create one resource group to host all resources generated by the following operations.

You can choose another name for the resource group and another location.

#!/usr/bin/env bash

resource_group=cc-prep-rsr-grp
location=eastus

az login

az group create --name $resource_group --location $location

Create One Azure Container Registry (ACR)

With the resource group created, we first create one Azure Container Registry (ACR) which our built images are pushed to and pulled from.

#!/usr/bin/env bash

resource_group=cc-prep-rsr-grp
location=eastus
reg_name=ccprepreg
dnl_scope=Unsecure

az group create --name $resource_group --location $location

az acr create --resource-group $resource_group \
--name $reg_name --sku Standard \
--dnl-scope $dnl_scope

Note

As at the later steps, it is required to operate the Azure Container Registry (ACR) with higher permissions, please check if you have at least “Contributor” role on the ACR. Roles with lesser permissions may cause the following steps to fail.

Build Docker Container Images

The container images we are going to build will include the startup kit for the NVFlare server. Therefore, please obtain that set of files and copy the startup kit to one folder inside the current working directory. In the following example, the server’s startup kit and the Dockerfile are stored in nvflserver.eastus.azurecontainer.io folder and docker folder under current working folder, respectively, as shown below.

$ tree -d -L 1
.
├── docker
└── nvflserver.eastus.azurecontainer.io
#!/usr/bin/env bash

tag=0.0.1
name=cc_prep
reg_name=ccprepreg
registry=${reg_name}.azurecr.io
nvfl_root=nvflserver.eastus.azurecontainer.io

docker build --build-arg NVFL_ROOT=$nvfl_root -t $registry/$name:$tag -f docker/Dockerfile .
FROM python:3.10
ARG NVFL_ROOT=nvflserver.eastus.azurecontainer.io
WORKDIR /workspace
RUN python3 -m pip install --no-cache nvflare
COPY $NVFL_ROOT nvflare

The following is a sample cc_server.yml file, which is used with project.yml for cc provision. A sample project.yml file is shown after the cc_server.yml file. Note this sample project.yml file also includes cc_site-1.yml, as described in Azure Confidential Virtual Machine Deployment Guide - Creating Azure confidential virtual machines

compute_env: azure_confidential_container
cc_cpu_mechanism: amd_sev_snp
role: server
cc_issuers:
  - id: aci_authorizer
    path: nvflare.app_opt.confidential_computing.aci_authorizer.ACIAuthorizer
    token_expiration: 100 # seconds, needs to be less than check_frequency

The following is the sample project.yml file.

api_version: 3
name: example_project
description: NVIDIA FLARE sample project yaml file
participants:
  # Change the name of the server (server1) to the Fully Qualified Domain Name
  # (FQDN) of the server, for example: server1.example.com.
  # Ensure that the FQDN is correctly mapped in the /etc/hosts file.
  - name: server1
    type: server
    org: nvidia
    fed_learn_port: 8002
    cc_config: cc_server.yml
  - name: site-1
    type: client
    org: nvidia
    cc_config: cc_site-1.yml
    # Specifying listening_host will enable the creation of one pair of
    # certificate/private key for this client, allowing the client to function
    # as a server for 3rd-party integration.
    # The value must be a hostname that the external trainer can reach via the network.
    # listening_host: site-1-lh
  - name: admin@nvidia.com
    type: admin
    org: nvidia
    role: project_admin
# The same methods in all builders are called in their order defined in builders section
builders:
  - path: nvflare.lighter.impl.workspace.WorkspaceBuilder
  - path: nvflare.lighter.impl.static_file.StaticFileBuilder
    args:
      # config_folder can be set to inform NVIDIA FLARE where to get configuration
      config_folder: config
      # scheme for communication driver (currently supporting the default, grpc, only).
      # scheme: grpc

      # app_validator is used to verify if uploaded app has proper structures
      # if not set, no app_validator is included in fed_server.json
      # app_validator: PATH_TO_YOUR_OWN_APP_VALIDATOR
  - path: nvflare.lighter.impl.cert.CertBuilder
  - path: nvflare.lighter.cc_provision.impl.cc.CCBuilder
  - path: nvflare.lighter.impl.signature.SignatureBuilder

Publish Container Images to Azure Container Registry

After the above steps, we have one container image built and stored locally. Now we will push it to ACR we created in step 2. This step requires you to obtain an access token from ACR and to use it to login to ACR.

#!/usr/bin/env bash

reg_name=ccprepreg
reg_token_file=reg_token.json

az acr login --name $reg_name --expose-token > $reg_token_file

echo "ACR reg token saved to $reg_token_file"

With the token file available, you can log in to the ACR.

#!/usr/bin/env bash

reg_name=ccprepreg
registry=${reg_name}.azurecr.io
reg_token_file=reg_token.json

reg_token=$(jq -r .accessToken $reg_token_file)

docker login $registry -u 00000000-0000-0000-0000-000000000000 -p $reg_token

Then you can push your newly built container image to the ACR.

#!/usr/bin/env bash

tag=0.0.1
name=cc_prep
reg_name=ccprepreg
registry=${reg_name}.azurecr.io

docker push $registry/$name:$tag
docker push $registry/skr:2.7

Note

The skr:2.7 is built from Microsoft open source project at https://github.com/microsoft/confidential-sidecar-containers. Please check its document on how to build the skr image and rename it with the registry name.

Create And Run The Confidential ACI Launch Script

This step requires the confcom extension for az cli. You can install it with this command.

az extension add --name confcom

The following is the script to properly inject the credential and verification information of container images into the Azure Resource Manager (ARM) template file, cce_done_p.json in this case.

#!/usr/bin/env bash

resource_group=cc-prep-rsr-grp
base_file=ccprep.json
reg_token_file=reg_token.json
export registry_token=$(jq -r .accessToken $reg_token_file)

tmp=$(jq . $base_file)
tmp=$(echo $tmp | jq '.resources[0].properties.imageRegistryCredentials[0].password = env.registry_token')
echo $tmp > tmp.json

az confcom acipolicygen -a tmp.json --print-policy > cce_token.b64

export cce_token=$(cat cce_token.b64)
cce_done=$(echo $tmp | jq '.resources[0].properties.confidentialComputeProperties.ccePolicy = env.cce_token')

echo $cce_done > cce_done_p.json

az deployment group create --resource-group $resource_group --template-file cce_done_p.json

The script requires a base ARM template file, as shown below. All token or verification information are removed in this sample file. However, you need to edit it to match the information used in your confidential ACI and NVFlare server configuration. Please note the port number, location, container image names and confidential ACI resources must be updated according to your own account and previous settings.

If you encounter permission issues or the script gets stuck, please check your role in the ACR.

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "name": {
      "type": "string",
      "defaultValue": "NVFLARE-CC",
      "metadata": {
        "description": "Name for the container group"
      }
    },
    "location": {
      "type": "string",
      "defaultValue": "East US",
      "metadata": {
        "description": "Location for all resources."
      }
    },
    "containerRegistry": {
      "defaultValue": "ccprepreg.azurecr.io",
      "type": "string",
      "metadata": {
        "description": "The container registry login server."
      }
    },
    "MAAEndpoint": {
      "defaultValue": "sharedneu.neu.attest.azure.net",
      "type": "string",
      "metadata": {
        "description": "Proxy sidecar MAA endpoint"
      }
    },
    "restartPolicy": {
      "type": "string",
      "defaultValue": "Never",
      "allowedValues": [
        "Always",
        "Never",
        "OnFailure"
      ],
      "metadata": {
        "description": "The behavior of Azure runtime if container has stopped."
      }
    }
  },
  "resources": [
    {
      "type": "Microsoft.ContainerInstance/containerGroups",
      "apiVersion": "2022-10-01-preview",
      "name": "[parameters('name')]",
      "location": "[parameters('location')]",
      "properties": {
        "confidentialComputeProperties": {
          "ccePolicy": ""
        },
        "sku": "Confidential",
        "containers": [
          {
            "name": "nvflare-server",
            "properties": {                                                                                                            
              "image": "ccprepreg.azurecr.io/cc_prep:0.0.1",
              "command": [                                                                                                             
                "/usr/local/bin/python3",
                "-u",          
                "-m",
                "nvflare.private.fed.app.server.server_train",
                "-m",                    
                "/workspace/nvflare",
                "-s",                                
                "fed_server.json",
                "--set",              
                "secure_train=true",
                "config_folder=config",
                "org=nvidia"   
              ],   
              "ports": [
                {
                  "port": 8002,
                  "protocol": "TCP"
                }
              ],
              "resources": {
                "requests": {
                  "cpu": 3.3,
                  "memoryInGB": 9.4
                }
              },
              "securityContext": {
                "privileged": true
              }
            }
          },
          {
            "name": "skr-sidecar",
            "properties": {
             "securityContext": {
                "privileged": true
              },
              "image": "ccprepreg.azurecr.io/skr:2.7",
              "command" : [
                "/bin/sh",
                "skr.sh",
                "ewp9",
                "8284"
              ],
              "environmentVariables": [],
              "ports": [
                {
                  "port": 8284,
                  "protocol": "TCP"
                }
              ],
              "resources": {
                "requests": {
                  "cpu": 0.5,
                  "memoryInGB": 1.6
                }
              }
            }
          }
        ],
        "osType": "Linux",
        "restartPolicy": "[parameters('restartPolicy')]",
        "imageRegistryCredentials": [
          {
            "server": "ccprepreg.azurecr.io",
            "username": "00000000-0000-0000-0000-000000000000",
            "password": ""
          }
        ],
        "ipAddress": {
          "type": "Public",
          "ports": [
            {
              "port": 8002,
              "protocol": "TCP"
            }
          ],
          "dnsNameLabel": "nvflserver"
        }
      }
    }
  ],
  "outputs": {
    "containerIPv4Address": {
      "type": "string",
      "value": "[reference(resourceId('Microsoft.ContainerInstance/containerGroups', parameters('name'))).ipAddress.ip]"
    }
  }
}

This script will take a few minutes to finish. After it’s finished, you can check the confidential ACI is up and running in the Container Instances page of Azure.