Containerized Deployment
Containerized Deployment with Docker
Docker support has two common uses:
Run FLARE inside a generic development container for simulation, notebooks, POC mode, or manual experiments. In this pattern, Docker provides the Python environment and filesystem, while FLARE processes still run normally inside that one container.
Prepare provisioned startup kits for Docker runtime execution. This is the recommended path when FLARE parent server/client processes should run in Docker and launch server/client jobs as separate Docker containers.
The current Docker runtime workflow is:
Build a parent image with NVFlare installed.
Build a job image with NVFlare and workload dependencies installed.
Provision server, client, and admin startup kits.
Run Deploy Command on each server or client startup kit that should run in Docker.
Start prepared server/client kits with
startup/start_docker.sh.Submit jobs whose
meta.jsoncontains Docker settings in Launcher-Specific Execution Settings.
For a runnable end-to-end workflow, see the Docker job launcher example.
Prerequisites
Before starting with containerized deployment, ensure you have:
Docker installed on your system
NVIDIA Container Toolkit installed for GPU support
System requirements met as per the NVIDIA Container Toolkit Install Guide
Provisioned server/client startup kits when preparing a Docker runtime deployment
Running NVIDIA FLARE in a Docker container provides several benefits:
Consistent environment across different systems
Easy dependency management
Repeatable runtime preparation
Isolated execution environment
GPU support through NVIDIA Container Toolkit
Parent and Job Images
The Docker runtime separates parent containers from job containers:
The parent image runs the long-lived FLARE server process or client process from a prepared startup kit.
The job image runs server job and client job processes launched by
DockerJobLauncher.
The parent image is configured in the runtime docker.yaml used by
nvflare deploy prepare. The job image is configured in the submitted job’s
meta.json under launcher_spec. You can use the same image for both
roles, but keeping them separate is often cleaner: the parent image needs the
FLARE runtime and Docker SDK access, while the job image needs the workload
frameworks and training dependencies.
Docker Runtime Workflow
Build or publish the images that your sites will use. The runnable
Docker job launcher example builds an
example parent image, nvflare-site:latest, and an example job image,
nvflare-job:latest:
cd examples/docker
bash build_docker.sh
After provisioning a project:
nvflare provision -p project.yml
Create a Docker runtime config for nvflare deploy prepare:
runtime: docker
parent:
docker_image: nvflare-site:latest
network: nvflare-network
job_launcher:
default_python_path: /usr/local/bin/python
default_job_env:
NCCL_P2P_DISABLE: "1"
default_job_container_kwargs:
shm_size: 8g
ipc_mode: host
Prepare each server or client startup kit that should run in Docker:
nvflare deploy prepare workspace/<project>/prod_00/server \
--config docker.yaml \
--output workspace/<project>/prepared/server
nvflare deploy prepare workspace/<project>/prod_00/site-1 \
--config docker.yaml \
--output workspace/<project>/prepared/site-1
The prepared kits contain startup/start_docker.sh, patched launcher
configuration, and a local/study_data.yaml template. Admin startup kits are
not prepared because they do not run parent server or client processes.
Start prepared parent processes with:
cd workspace/<project>/prepared/server
bash startup/start_docker.sh
Run the same command from each prepared client kit. The generated script creates the configured Docker network if needed and mounts the prepared kit into the parent container.
Jobs submitted to Docker-mode sites must specify their job image in
launcher_spec:
{
"launcher_spec": {
"default": {
"docker": {"image": "nvflare-job:latest"}
},
"site-1": {
"docker": {"shm_size": "8g", "ipc_mode": "host"}
}
},
"resource_spec": {
"site-1": {"num_of_gpus": 1}
}
}
Use launcher_spec for launcher-specific image and container settings. Keep
scheduler-facing resource requests, such as num_of_gpus, in
resource_spec. Sites that are not configured with the Docker job launcher
continue to use their configured launcher, usually process mode.
Development Container
A single-container image can also be useful for local development or simulator
work. In this case, Docker captures a Python environment with NVFlare and your
development dependencies installed. This is an alternative to a bare-metal
Python virtual environment, not a replacement for nvflare deploy prepare
when using the Docker job launcher.
You can use this single-container pattern to run all FLARE processes on one
host for development: run the simulator, run POC mode, or start provisioned
server/client scripts manually from different shells in the same container.
This is different from Docker runtime deployment. The job launcher remains
process mode unless the server or client startup kit is prepared with
nvflare deploy prepare.
After you build an image for your environment and tag it nvflare-dev:latest,
run it with GPU support and a persistent workspace:
mkdir my-workspace
docker run --rm -it --gpus all \
--ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
-v $(pwd -P)/my-workspace:/workspace/my-workspace \
nvflare-dev:latest
Once the container is running, you can also exec into the container, for example if you need another
terminal to start additional FLARE clients. First find the CONTAINER ID using docker ps, and then
use that ID to exec into the container:
docker ps # use the CONTAINER ID in the output
docker exec -it <CONTAINER ID> /bin/bash
Best Practices
Always use the latest compatible NVIDIA Container Toolkit version
Use
nvflare deploy preparefor Docker runtime startup kitsKeep parent image and job image responsibilities explicit
Put Docker image and container settings in
launcher_specPut scheduler-facing resource requests in
resource_specMount volumes for persistent data storage
Keep your base images updated for security patches
Note
Docker Compose deployment is deprecated. Use nvflare deploy prepare for
current Docker runtime preparation.
Common Issues and Solutions
Docker daemon access: ensure the user running
start_docker.shcan access the Docker daemon.Missing job image: ensure every Docker-mode job provides
launcher_spec[site]["docker"]["image"]orlauncher_spec["default"]["docker"]["image"].GPU access issues: ensure NVIDIA Container Toolkit is properly installed and the job’s
resource_specrequests the needed GPUs.Memory or shared-memory issues: set container kwargs such as
shm_sizeoripc_modeinlauncher_specor in deploy preparedefault_job_container_kwargs.Network connectivity: keep the configured Docker network and server hostname resolvable from admin, parent, and job containers.