Hello JAX

Warning

This example tracks main, which is the NVFlare development branch for the next release. On main, example requirements.txt files may pin the first upcoming NVFlare version that supports a feature, even before that package is published on PyPI. If the pinned nvflare version is not available yet, install NVFlare from this repo instead of from PyPI.

This example demonstrates how to use NVIDIA FLARE with JAX, Flax, and Optax to train an MNIST classifier using federated averaging (FedAvg). It follows the same hello-world recipe structure as hello-pt, but uses a JAX client training loop and a flattened parameter vector for transport.

Install NVFLARE and Dependencies

For the complete installation instructions, see Installation.

pip install nvflare

First get the example code from GitHub:

git clone https://github.com/NVIDIA/NVFlare.git

Then navigate to the hello-jax directory:

git switch <release branch>
cd examples/hello-world/hello-jax

Install the dependencies:

pip install -r requirements.txt

Code Structure

hello-jax
|
|-- client.py         # client local training script
|-- model.py          # JAX/Flax model helpers
|-- prepare_data.py   # helper that downloads MNIST and writes .npy files
|-- prepare_model.py  # helper that writes the initial flattened checkpoint
|-- job.py            # job recipe that defines client and server configurations
|-- requirements.txt  # dependencies

Data

This example uses the MNIST dataset. The job script downloads the raw MNIST files once before the simulator starts and converts them into .npy files. Each client then loads from that prepared cache.

Model

The model in model.py is a small convolutional neural network implemented with Flax.

model code (model.py)

"""
JAX/Flax model utilities for the hello-jax MNIST example.
"""

from functools import lru_cache

import jax
import jax.numpy as jnp
import numpy as np
import optax
from flax import linen as nn
from flax.training import train_state
from jax.flatten_util import ravel_pytree


class ConvNet(nn.Module):
    """Small CNN for MNIST classification."""

    @nn.compact
    def __call__(self, x):
        x = nn.Conv(features=32, kernel_size=(3, 3))(x)
        x = nn.relu(x)
        x = nn.avg_pool(x, window_shape=(2, 2), strides=(2, 2))
        x = nn.Conv(features=64, kernel_size=(3, 3))(x)
        x = nn.relu(x)
        x = nn.avg_pool(x, window_shape=(2, 2), strides=(2, 2))
        x = x.reshape((x.shape[0], -1))
        x = nn.Dense(features=128)(x)
        x = nn.relu(x)
        x = nn.Dense(features=10)(x)
        return x


MODEL = ConvNet()
INPUT_SHAPE = (1, 28, 28, 1)


@lru_cache(maxsize=1)
def _template_tree_and_unravel_fn():
    params = MODEL.init(jax.random.PRNGKey(0), jnp.ones(INPUT_SHAPE, dtype=jnp.float32))["params"]
    _, unravel_fn = ravel_pytree(params)
    return params, unravel_fn


def create_initial_params():
    params, _ = _template_tree_and_unravel_fn()
    return params


def flatten_params(params) -> np.ndarray:
    flat_params, _ = ravel_pytree(params)
    return np.asarray(flat_params, dtype=np.float32)


def unflatten_params(flat_params):
    _, unravel_fn = _template_tree_and_unravel_fn()
    return unravel_fn(jnp.asarray(flat_params, dtype=jnp.float32))


def create_train_state(params, learning_rate: float, momentum: float) -> train_state.TrainState:
    tx = optax.sgd(learning_rate=learning_rate, momentum=momentum)
    return train_state.TrainState.create(apply_fn=MODEL.apply, params=params, tx=tx)

Client Code

The client code (client.py) keeps the local training loop in JAX while using NVFlare’s client API to receive the current global model and return the updated parameters.

client code (client.py)

"""
Client-side JAX/Flax training script for the hello-jax example.
"""

import argparse
import math
import re

import jax
import jax.numpy as jnp
import numpy as np
import optax
from model import MODEL, create_train_state, flatten_params, unflatten_params

import nvflare.client as flare
from nvflare.apis.fl_constant import FLMetaKey
from nvflare.app_common.np.constants import NPConstants


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--epochs", type=int, default=1)
    parser.add_argument("--batch_size", type=int, default=128)
    parser.add_argument("--learning_rate", type=float, default=0.05)
    parser.add_argument("--momentum", type=float, default=0.9)
    parser.add_argument("--num_partitions", type=int, default=2)
    parser.add_argument("--data_dir", type=str, default="/tmp/nvflare/data/hello-jax/mnist")
    return parser.parse_args()


def load_mnist(data_dir: str):
    train_images = np.load(f"{data_dir}/train_images.npy").astype(np.float32) / 255.0
    train_labels = np.load(f"{data_dir}/train_labels.npy").astype(np.int32)
    test_images = np.load(f"{data_dir}/test_images.npy").astype(np.float32) / 255.0
    test_labels = np.load(f"{data_dir}/test_labels.npy").astype(np.int32)

    return (train_images, train_labels), (test_images, test_labels)


def split_for_client(images, labels, client_name: str, num_partitions: int):
    match = re.search(r"(\d+)$", client_name)
    if not match:
        return images, labels

    client_number = int(match.group(1))
    if client_number <= 0:
        raise ValueError(f"Client name '{client_name}' must use 1-indexed site numbering.")

    client_idx = client_number - 1
    partitions = max(num_partitions, 1)
    image_splits = np.array_split(images, partitions)
    label_splits = np.array_split(labels, partitions)
    if client_idx >= len(image_splits):
        raise ValueError(
            f"Client index {client_idx + 1} from site name '{client_name}' exceeds available partitions "
            f"{len(image_splits)}."
        )
    return image_splits[client_idx], label_splits[client_idx]


@jax.jit
def train_step(state, images, labels):
    def loss_fn(params):
        logits = MODEL.apply({"params": params}, images)
        loss = optax.softmax_cross_entropy_with_integer_labels(logits, labels).mean()
        return loss, logits

    (loss, logits), grads = jax.value_and_grad(loss_fn, has_aux=True)(state.params)
    state = state.apply_gradients(grads=grads)
    accuracy = jnp.mean(jnp.argmax(logits, axis=-1) == labels)
    return state, loss, accuracy


@jax.jit
def eval_step(params, images, labels):
    logits = MODEL.apply({"params": params}, images)
    loss = optax.softmax_cross_entropy_with_integer_labels(logits, labels).mean()
    accuracy = jnp.mean(jnp.argmax(logits, axis=-1) == labels)
    return loss, accuracy


def train_epoch(state, images, labels, batch_size: int, rng):
    num_examples = len(images)
    if num_examples == 0:
        raise ValueError("No training data available for this client.")

    permutation = np.asarray(jax.random.permutation(rng, num_examples))
    total_loss = 0.0
    total_accuracy = 0.0
    steps = 0

    for start in range(0, num_examples, batch_size):
        end = start + batch_size
        indices = permutation[start:end]
        batch_images = jnp.asarray(images[indices])
        batch_labels = jnp.asarray(labels[indices])
        state, loss, accuracy = train_step(state, batch_images, batch_labels)
        total_loss += float(loss)
        total_accuracy += float(accuracy)
        steps += 1

    return state, total_loss / steps, total_accuracy / steps, steps


def evaluate(params, images, labels, batch_size: int):
    num_examples = len(images)
    if num_examples == 0:
        raise ValueError("No evaluation data available for this client.")

    total_loss = 0.0
    total_accuracy = 0.0
    steps = 0

    for start in range(0, num_examples, batch_size):
        end = start + batch_size
        batch_images = jnp.asarray(images[start:end])
        batch_labels = jnp.asarray(labels[start:end])
        loss, accuracy = eval_step(params, batch_images, batch_labels)
        total_loss += float(loss)
        total_accuracy += float(accuracy)
        steps += 1

    return total_loss / steps, total_accuracy / steps


def main():
    args = parse_args()
    flare.init()

    sys_info = flare.system_info()
    client_name = sys_info["site_name"]

    (train_images, train_labels), (test_images, test_labels) = load_mnist(args.data_dir)
    train_images, train_labels = split_for_client(train_images, train_labels, client_name, args.num_partitions)
    test_images, test_labels = split_for_client(test_images, test_labels, client_name, args.num_partitions)

    print(f"site={client_name}, train_samples={len(train_images)}, test_samples={len(test_images)}")

    rng = jax.random.PRNGKey(0)
    while flare.is_running():
        input_model = flare.receive()
        current_round = input_model.current_round
        flat_params = input_model.params[NPConstants.NUMPY_KEY]
        params = unflatten_params(flat_params)

        received_eval_loss, received_accuracy = evaluate(params, test_images, test_labels, args.batch_size)
        print(
            f"site={client_name}, round={current_round}, "
            f"received_model_eval_loss={received_eval_loss:.4f}, accuracy={received_accuracy:.4f}"
        )

        if flare.is_evaluate():
            flare.send(flare.FLModel(metrics={"accuracy": received_accuracy, "eval_loss": received_eval_loss}))
            continue

        state = create_train_state(params, learning_rate=args.learning_rate, momentum=args.momentum)
        steps_per_epoch = math.ceil(len(train_images) / args.batch_size)

        for epoch in range(args.epochs):
            rng, epoch_rng = jax.random.split(rng)
            state, train_loss, train_accuracy, _ = train_epoch(
                state,
                train_images,
                train_labels,
                args.batch_size,
                epoch_rng,
            )
            print(
                f"site={client_name}, round={current_round}, epoch={epoch + 1}, "
                f"train_loss={train_loss:.4f}, train_accuracy={train_accuracy:.4f}"
            )

        updated_eval_loss, updated_accuracy = evaluate(state.params, test_images, test_labels, args.batch_size)
        print(
            f"site={client_name}, round={current_round}, "
            f"trained_model_eval_loss={updated_eval_loss:.4f}, accuracy={updated_accuracy:.4f}"
        )

        updated_params = flatten_params(state.params)
        output_model = flare.FLModel(
            params={NPConstants.NUMPY_KEY: updated_params},
            params_type=flare.ParamsType.FULL,
            metrics={"accuracy": updated_accuracy, "eval_loss": updated_eval_loss},
            meta={FLMetaKey.NUM_STEPS_CURRENT_ROUND: args.epochs * steps_per_epoch},
        )
        flare.send(output_model)


if __name__ == "__main__":
    main()

Server Code

This example uses the base FedAvgRecipe configured for NumPy parameter exchange. The JAX parameter tree is flattened into a single NumPy vector before it is exchanged with the server, then reconstructed on the client before each training round.

Before running the job, prepare two resources:

The initial flattened checkpoint is generated by prepare_model.py and passed to FedAvgRecipe through initial_ckpt.
The shared MNIST .npy cache is prepared once by prepare_data.py so both simulated clients do not try to download the dataset at the same time or rely on TensorFlow-only data utilities.

Prepare Assets

Prepare the initial checkpoint and dataset using the default locations under /tmp/nvflare/data/hello-jax:

python prepare_model.py
python prepare_data.py

You can also prepare them in custom locations:

python prepare_model.py --output /path/to/initial_model.npy
python prepare_data.py --data_dir /path/to/mnist

Job Recipe Code

job recipe (job.py)

"""Recipe entrypoint for the hello-jax MNIST example."""

import argparse
import os
import shlex

from nvflare.client.config import ExchangeFormat
from nvflare.fuel.utils.constants import FrameworkType
from nvflare.recipe import FedAvgRecipe, SimEnv

DEFAULT_INITIAL_CKPT = "/tmp/nvflare/data/hello-jax/initial_model.npy"
DEFAULT_DATA_DIR = "/tmp/nvflare/data/hello-jax/mnist"
REQUIRED_DATA_FILES = ("train_images.npy", "train_labels.npy", "test_images.npy", "test_labels.npy")


def define_parser():
    parser = argparse.ArgumentParser()
    parser.add_argument("--n_clients", type=int, default=2)
    parser.add_argument("--num_rounds", type=int, default=3)
    parser.add_argument("--epochs", type=int, default=1)
    parser.add_argument("--batch_size", type=int, default=128)
    parser.add_argument("--learning_rate", type=float, default=0.05)
    parser.add_argument("--momentum", type=float, default=0.9)
    parser.add_argument("--data_dir", type=str, default=DEFAULT_DATA_DIR)
    parser.add_argument("--initial_ckpt", type=str, default=DEFAULT_INITIAL_CKPT)
    parser.add_argument("--train_script", type=str, default="client.py")
    parser.add_argument(
        "--launch_external_process",
        action="store_true",
        help="Run train_script in a separate subprocess instead of in-process.",
    )
    return parser.parse_args()


def _validate_inputs(initial_ckpt: str, data_dir: str) -> None:
    if not os.path.isfile(initial_ckpt):
        raise FileNotFoundError(
            f"Initial checkpoint not found: {initial_ckpt}. "
            f"Run `python prepare_model.py --output {initial_ckpt}` first."
        )

    missing_files = [name for name in REQUIRED_DATA_FILES if not os.path.isfile(os.path.join(data_dir, name))]
    if missing_files:
        missing_str = ", ".join(missing_files)
        raise FileNotFoundError(
            f"Prepared MNIST files missing in {data_dir}: {missing_str}. "
            f"Run `python prepare_data.py --data_dir {data_dir}` first."
        )


def _build_train_args(args) -> str:
    return shlex.join(
        [
            "--epochs",
            str(args.epochs),
            "--batch_size",
            str(args.batch_size),
            "--learning_rate",
            str(args.learning_rate),
            "--momentum",
            str(args.momentum),
            "--num_partitions",
            str(args.n_clients),
            "--data_dir",
            args.data_dir,
        ]
    )


def main():
    args = define_parser()

    _validate_inputs(args.initial_ckpt, args.data_dir)
    train_args = _build_train_args(args)

    recipe = FedAvgRecipe(
        name="hello-jax",
        min_clients=args.n_clients,
        num_rounds=args.num_rounds,
        initial_ckpt=args.initial_ckpt,
        train_script=args.train_script,
        train_args=train_args,
        launch_external_process=args.launch_external_process,
        framework=FrameworkType.NUMPY,
        server_expected_format=ExchangeFormat.NUMPY,
    )

    env = SimEnv(num_clients=args.n_clients)
    run = recipe.execute(env)
    print()
    print("Job Status is:", run.get_status())
    print("Result can be found in :", run.get_result())
    print()


if __name__ == "__main__":
    main()

Run Job

After the assets are prepared, run the job script to execute the job in a simulation environment.

python job.py

You can adjust the main hyperparameters from the command line as needed:

python job.py --n_clients 2 --num_rounds 3 --epochs 1 --batch_size 128

If you prepared the assets in non-default locations, pass them explicitly:

python job.py --initial_ckpt /path/to/initial_model.npy --data_dir /path/to/mnist

Output Summary

Initialization: BaseModelController starts the FedAvg workflow, loads the initial flattened checkpoint, and writes simulation output under /tmp/nvflare/simulation/hello-jax.
Round 0: site-1 and site-2 are sampled, evaluate the received model at 0.0527 and 0.0398 accuracy, then train for one epoch to 0.8887 / 0.8857 training accuracy with 0.3616 / 0.3778 loss. The client log also includes a post-training evaluation line with trained_model_eval_loss and accuracy before the update is sent back to the server.
Round 1: Both sites are sampled again, received-model accuracy improves to 0.9545 and 0.9799, local training reaches 0.9702 / 0.9686 accuracy with 0.0990 / 0.0999 loss, the client log includes a second evaluation pass on the trained local model, and the aggregated validation metric becomes a new best at 0.9671875.
Round 2: Received-model accuracy improves again to 0.9762 and 0.9900, local training finishes at 0.9795 / 0.9790 accuracy with 0.0671 / 0.0683 loss, the client log again reports trained_model_eval_loss and accuracy after local training, and the aggregated validation metric becomes a new best at 0.98310546875.
Completion: FedAvg finishes after 3 rounds, persists the final NumPy checkpoint to /tmp/nvflare/simulation/hello-jax/server/simulate_job/models/server.npy, and reports the simulation result directory at /tmp/nvflare/simulation/hello-jax.