Hello TensorFlow

This example demonstrates how to use NVIDIA FLARE with TensorFlow to train an image classifier using federated averaging (FedAvg). TensorFlow serves as the deep learning training framework in this example.

For detailed documentation, see the Hello TensorFlow example page.

We recommend using the NVIDIA TensorFlow docker for GPU support. If GPU is not required, a Python virtual environment is sufficient.

To run this example with the FLARE API, refer to the hello_world notebook.

Run NVIDIA TensorFlow Container

Ensure the NVIDIA container toolkit is installed. Then execute the following command:

docker run --gpus=all -it --rm -v [path_to_NVFlare]:/NVFlare nvcr.io/nvidia/tensorflow:xx.xx-tf2-py3

NVIDIA FLARE Installation

For complete installation instructions, visit Installation.

pip install nvflare

clone the example code from GitHub:

git clone https://github.com/NVIDIA/NVFlare.git

Navigate to the hello-tf directory:

git switch <release branch>
cd examples/hello-world/hello-tf

Install the dependencies:

pip install -r requirements.txt

Code Structure

hello-pt
|
|-- client.py         # client local training script
|-- model.py          # model definition
|-- job.py            # job recipe that defines client and server configurations
|-- requirements.txt  # dependencies

Data

This example uses the MNIST handwritten digits dataset, which is loaded within the trainer code.

Model

The model.py file defines a simple neural network using TensorFlow’s Keras API. The Net model is a sequential architecture designed for image classification, featuring:

Flatten Layer: Prepares input data for dense layers.
Dense Layer: 128 units with ReLU activation for non-linearity.
Dropout Layer: 20% dropout rate to mitigate overfitting.
Output Layer: 10 units for classifying MNIST digits.

This model is used in federated learning with NVIDIA FLARE, trained across clients using the FedAvg algorithm.

model code (model.py)

from tensorflow.keras import layers, models


class Net(models.Sequential):
    def __init__(self, input_shape=(None, 28, 28)):
        super().__init__()
        self._input_shape = input_shape
        self.add(layers.Flatten())
        self.add(layers.Dense(128, activation="relu"))
        self.add(layers.Dropout(0.2))
        self.add(layers.Dense(10))

Client Code

The client code client.py is responsible for training. The training code closely resembles standard PyTorch training code, with additional lines to handle data exchange with the server.

client code (client.py)

import tensorflow as tf
from model import Net

import nvflare.client as flare
from nvflare.client.tracking import SummaryWriter

WEIGHTS_PATH = "./tf_model.weights.h5"


def main():
    flare.init()
    writer = SummaryWriter()

    sys_info = flare.system_info()
    print(f"system info is: {sys_info}", flush=True)

    model = Net()
    model.build(input_shape=(None, 28, 28))
    model.compile(
        optimizer="adam", loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=["accuracy"]
    )
    model.summary()

    (train_images, train_labels), (
        test_images,
        test_labels,
    ) = tf.keras.datasets.mnist.load_data()
    train_images, test_images = (
        train_images / 255.0,
        test_images / 255.0,
    )

    # simulate separate datasets for each client by dividing MNIST dataset in half
    client_name = sys_info["site_name"]
    if client_name == "site-1":
        train_images = train_images[: len(train_images) // 2]
        train_labels = train_labels[: len(train_labels) // 2]
        test_images = test_images[: len(test_images) // 2]
        test_labels = test_labels[: len(test_labels) // 2]
    elif client_name == "site-2":
        train_images = train_images[len(train_images) // 2 :]
        train_labels = train_labels[len(train_labels) // 2 :]
        test_images = test_images[len(test_images) // 2 :]
        test_labels = test_labels[len(test_labels) // 2 :]

    while flare.is_running():
        input_model = flare.receive()
        print(f"current_round={input_model.current_round}")

        sys_info = flare.system_info()
        print(f"system info is: {sys_info}")

        for k, v in input_model.params.items():
            model.get_layer(k).set_weights(v)

        _, test_global_acc = model.evaluate(test_images, test_labels, verbose=2)
        print(
            f"Accuracy of the received model on round {input_model.current_round} on the test images: {test_global_acc * 100} %"
        )
        writer.add_scalar(tag="local_acc", scalar=test_global_acc, global_step=input_model.current_round)

        # training
        model.fit(train_images, train_labels, epochs=1, validation_data=(test_images, test_labels))

        print("Finished Training")

        model.save_weights(WEIGHTS_PATH)

        sys_info = flare.system_info()
        print(f"system info is: {sys_info}", flush=True)
        print(f"finished round: {input_model.current_round}", flush=True)

        output_model = flare.FLModel(
            params={layer.name: layer.get_weights() for layer in model.layers},
            params_type="FULL",
            metrics={"accuracy": test_global_acc},
            current_round=input_model.current_round,
        )

        flare.send(output_model)


if __name__ == "__main__":
    main()

Server Code

In federated averaging, the server code aggregates model updates from clients, following a scatter-gather workflow pattern. This example uses the default federated averaging algorithm provided by NVFlare, eliminating the need for custom server code.

Job Recipe Code

The job recipe includes client.py and the built-in FedAvg algorithm.

job recipe (job.py)

from model import Net

from nvflare.app_opt.tf.recipes.fedavg import FedAvgRecipe
from nvflare.recipe import SimEnv, add_experiment_tracking

if __name__ == "__main__":
    n_clients = 2
    num_rounds = 3
    train_script = "client.py"

    recipe = FedAvgRecipe(
        name="hello-tf_fedavg",
        num_rounds=num_rounds,
        # Model can be specified as class instance or dict config:
        model=Net(),
        # Alternative: model={"class_path": "model.Net", "args": {}},
        # For pre-trained weights: initial_ckpt="/server/path/to/model.h5",
        min_clients=n_clients,
        train_script=train_script,
    )
    add_experiment_tracking(recipe, tracking_type="tensorboard")

    env = SimEnv(num_clients=n_clients)
    run = recipe.execute(env=env)
    print()
    print("Result can be found in :", run.get_result())
    print("Job Status is:", run.get_status())
    print()

Model Input Options

The model parameter accepts two formats:

Class instance (subclassed Keras model): model=Net() - Convenient and Pythonic
Dict config: model={"class_path": "model.Net", "args": {}} - Better for large models

To resume from pre-trained weights:

recipe = FedAvgRecipe(
    model=Net(),
    initial_ckpt="/server/path/to/pretrained.h5",  # Absolute path
    ...
)

Note

For TensorFlow/Keras, use a subclassed Keras class instance (for example, Net()) or dict config for model. SavedModel or .h5 files contain both architecture and weights, so initial_ckpt can be used without model.

Run the Experiment

Execute the script using the job API to create the job and run it with the simulator:

TF_FORCE_GPU_ALLOW_GROWTH=true TF_GPU_ALLOCATOR=cuda_malloc_async python3 job.py

Access the Logs and Results

Find the running logs and results inside the simulator’s workspace:

$ ls /tmp/nvflare/jobs/workdir

Notes on Running with GPUs

When using GPUs, TensorFlow attempts to allocate all available GPU memory at startup. To prevent this in multi-client scenarios, set the following flags:

TF_FORCE_GPU_ALLOW_GROWTH=true TF_GPU_ALLOCATOR=cuda_malloc_async

If you have more GPUs than clients, consider running one client per GPU using the –gpu argument during simulation, e.g., nvflare simulator -n 2 –gpu 0,1 [job].