Below is a list of terms and concepts in NVIDIA FLARE and their definitions.
The Aggregator defines the algorithm used on the server to aggregate the data passed back to the server in the clients’ Shareable object.
An Application is a named directory structure that defines the client and server configuration and any custom code required to implement the Controller/Worker workflow. Since 2.1.0, jobs have been introduced to manage the deployment of apps in an experiment.
The Controller is a python object on the FL server side that controls or coordinates Workers to perform tasks. The Controller defines the overall collaborative computing workflow. In its control logic, the Controller assigns tasks to Workers and processes task results from the workers.
Events allow for dynamic notifications to be sent to all objects that are a subclass of FLComponent. Every FLComponent is an event handler. The event mechanism is like a pub-sub mechanism that enables indirect communication between components for data sharing.
The Executors is the component on the FL Client side that executes the task received from the Controller on the FL Server. For example, in DL training, the Executors would implement the training loop. There can be multiple executors on the client, designed to execute different tasks (training, validation/evaluation, data preparation, etc.).
Filters are used to define transformations of the data in the Shareable object when transferred between server and client and vice versa. Filters can be applied when the data is sent or received by either the client or server.
FLAdminAPI is a wrapper for admin commands that can be issued
by an admin client to the FL server. You can use a provisioned admin client’s certs and keys to initialize an instance
of FLAdminAPI to programmatically submit commands to the FL server. FLARE API is a redesigned version of this introduced
in version 2.3.0.
FLARE API is a redesigned version of FLAdminAPI intended to provide a better user experience for issuing admin commands to the FL server.
FLARE Console (previously referred to as Admin Console or Admin Client)¶
The FLARE Console is used to orchestrate the FL study, including starting and stopping the server and clients and checking their status, deploying applications, and managing FL experiments.
Most component types are subclasses of FLComponent. You can create your own subclass of FLComponent for various purposes like listening to certain events and handling data.
FLContext is one of the key features of NVIDIA FLARE and is available to every method of all FLComponent types (Controller, Aggregator, Executor, Filter, Widget, …). An FLContext object contains contextual information of the FL environment: overall system settings (peer name, job id / run number, workspace location, etc.). FLContext also contains an important object called Engine, through which you can access important services provided by the system (e.g. fire events, get all available client names, send aux messages, etc.).
High Availability and Server Failover is a feature implemented in NVIDIA FLARE 2.1.0 around FL server failover introducing an Overseer to coordinate multiple FL servers.
Jobs contain all of the apps and the information of which app(s) to deploy to which clients or server, the resource requirements for the experiment, and everything about the experiment.
Learnable is the result of the Federated Learning application maintained by the server. In DL workflows, the Learnable is the aspect of the DL model to be learned. For example, the model weights are commonly the Learnable feature, not the model geometry. Depending on the purpose of your study, the Learnable may be any component of interest. Learnable is an abstract object that is aggregated from the client’s Shareable object and is not DL-specific. It can be any model, or object. The Learnable is managed in the Controller workflow.
Learner is a class that focuses just on the training specific tasks, that
can be used for building a component to use with
This way, the communication constructs, error code handling, etc. that are specific to NVFLARE are handled in LearnerExecutor while
the training and validation logic can be focused in Learner.
nvflare.app_common.np.np_model_locator.NPModelLocator is a component to find the models to be included for cross site
evaluation located on server.
NVIDIA FLARE stands for NVIDIA Federated Learning Application Runtime Environment, a general-purpose framework designed for collaborative computing.
The overseer is a subsystem that monitors the FL servers in HA mode and tells clients which FL server to connect to. This is only applicable in HA mode.
The Peer Context is the contextual information of the message sender that is sent in addtion to the regular payload of the message when the FL parties communicate with each other.
A component for saving the state of something.
is a method implemented for the FL server to save the state of the Learnable object, for example writing a global model to disk.
The project.yaml is the file used in the provisioning process that has the Project’s specifications including the FL Server, FL Clients, and Admin Users as well as the Builders for assembling the Startup Kits.
Provisioning is the process of setting up a secure project with startup kits for the different participants including the FL Server, FL Clients, and Admin Users.
Roles in NVIDIA FLARE¶
The user roles in NVIDIA FLARE include Project Admin, Org Admin, Lead researcher, and Member researcher and can be used to set certain privileges of system operations for different users. See the NVIDIA FLARE Security page for details.
Scatter and Gather Workflow¶
The Scatter and Gather Workflow is an included reference implementation of the default workflow of previous versions of NVIDIA FLARE with an FL Server aggregating results from FL Clients.
Startup kits are products of the provisioning process and contain the configuration and certificates necessary to establish secure connections between the Overseer, FL servers, FL clients, and Admin clients. These files are used to establish identity and authorization policies between server and clients. Startup kits are distributed to the Overseer, FL servers, clients, and Admin clients depending on role.
A Task is a piece of work (Python code) that is assigned by the Controller to client workers. Depending on how the Task is assigned (broadcast, send, or relay), the task will be performed by one or more clients. The logic to be performed in a Task is defined in an Executor.
TB Analytics Receiver¶
The Tensorboard Analytics Receiver is part of the ML Experimental tracking. NVFLARE implemented the server-side ML Experimental tracking, with Tensorboard as the ML tracking tool. The client side collects the logs, and the FL server has the Tensorboard Summary Writer to send the logs to Tensorboard. The TB Analytics Receiver is the component that receives the logging from different clients and then writes to Tensorboard.
A Worker is capable of performing tasks (training, validation/evaluation, data preparation, etc.). Workers run on FL Clients.