Migration Guide

This guide covers API and configuration changes when upgrading between FLARE releases.

Upgrading from 2.7.2 to 2.8.0

Python and Removed Legacy Surfaces

FLARE 2.8.0 targets Python 3.10 through 3.14. Python 3.9 is no longer listed as a supported development target.

The deprecated FLAdminAPI surface has been removed. Use the FLARE API, Recipe API, Client API, and nvflare CLI workflows for new automation.

HA/Overseer code has also been removed from the 2.8 branch.

Client API Subprocess Timeout Validation

Subprocess-mode Client API jobs now validate two large-model safety settings at job initialization:

  • download_complete_timeout must not be None. The subprocess must stay alive after send_to_peer() ACKs so the server can finish downloading tensors from the subprocess DownloadService.

  • max_resends must not be None when using ClientAPILauncherExecutor. Unlimited resends can turn one delayed large-model transfer into an unbounded series of replacement download transactions.

If your 2.7.x job explicitly set either value to None, update it before running on 2.8.0. Recipe-based external-process jobs already serialize the default max_resends=3 in executor args, so the following setting is only needed when overriding a previous explicit None or choosing a different retry budget:

recipe.add_client_config({
    "download_complete_timeout": 1800,
    "max_resends": 3,  # finite non-negative integer; 0 disables retries
})

For large tensor or NumPy payloads, also keep the related streaming timeouts consistent. If you explicitly raise tensor_streaming_per_request_timeout or np_streaming_per_request_timeout, set PEER_READ_TIMEOUT and download_complete_timeout to values at least as large as the configured streaming per-request timeout, and keep tensor_min_download_timeout or np_min_download_timeout at least as large as the same value.

recipe.add_client_config({
    "tensor_streaming_per_request_timeout": 600,
    "tensor_min_download_timeout": 600,
    "PEER_READ_TIMEOUT": 600,
    "download_complete_timeout": 1800,
    "max_resends": 3,
})

Late Retry Handling for Finished Download Refs

FLARE 2.8.0 makes finished DownloadService refs retry-safe for the same requester. If a client completed a large download but retries because the final EOF response was delayed, the server returns the same terminal status instead of INVALID_REQUEST / no ref found. This is an internal reliability fix and does not require job-code changes, but it is most effective when the subprocess timeouts above are configured consistently for very large models.

Upcoming Main-Branch Changes

FLARE API Compatibility Note

On the current main branch, NoConnection now subclasses Python’s built-in ConnectionError instead of directly subclassing Exception.

Impact:

  • Existing code that catches ConnectionError will now also catch NoConnection.

  • Existing code that catches NoConnection continues to work unchanged.

If your application distinguishes FLARE connection failures from broader OS or network exceptions, review any broad except ConnectionError: handlers before upgrading to the next release built from main.

FLARE API Lifecycle Restriction

On the current main branch, Session.shutdown and Session.restart are now restricted to TargetType.SERVER only.

Impact:

  • Existing callers that pass TargetType.ALL or TargetType.CLIENT will now fail.

  • Server-scoped lifecycle control continues to work unchanged.

  • Session.shutdown_system is unchanged and still supports whole-system shutdown.

For whole local PoC lifecycle control, use the PoC start/stop flow instead of the general system admin API.

CLI Startup Kit Resolution Change

On the current main branch, server-connected CLI commands use a shared active startup kit registry in ~/.nvflare/config.conf.

Impact:

  • Use nvflare config add <id> <startup-kit-dir> and nvflare config use <id> to register and activate a startup kit.

  • nvflare config -d/--startup_kit_dir remains accepted for compatibility with 2.7.x scripts, but is deprecated.

  • NVFLARE_STARTUP_KIT_DIR remains an automation override and takes precedence over the active registry entry when set.

  • nvflare config -jt/--job_templates_dir remains accepted for compatibility with 2.7.x scripts, but job template config is deprecated.

  • Root nvflare config continues to manage local settings such as the POC workspace. Startup kit paths are managed by the nvflare config subcommands.

If you use shell profiles or CI settings that export NVFLARE_STARTUP_KIT_DIR, review them before upgrading because they override the active registry entry.

CLI Config Flag Compatibility

On the current main branch, nvflare config keeps the 2.7.x POC workspace flag names.

Impact:

  • -pw and --poc_workspace_dir remain the supported flags for setting the POC workspace.

  • The interim development-only --poc.workspace spelling is not part of the public compatibility contract.

If you have older scripts that use -pw or --poc_workspace_dir, they continue to work.

Client Disable Semantics

nvflare system remove-client is not exposed as a supported public CLI command. The legacy interactive-console remove_client command is hidden from normal help and remains a registry cleanup operation only: it releases the active token so the client can register again. It does not stop the client process, revoke credentials, or prevent reconnect.

Use the new durable access-control commands when the intent is to keep a client out of the federation:

  • nvflare system disable-client <client> --force persists a disabled flag in the server workspace, removes any active registry entry, and rejects later registration or heartbeat from that client.

  • nvflare system enable-client <client> --force clears the disabled flag so the client can rejoin on the next registration or heartbeat.

This is operational disablement, not certificate revocation.

Study Name Validation Relaxation

On the current main branch, study names now allow underscores in internal positions, so names such as my_study are valid.

Impact:

  • project.yml validation now accepts study names with internal underscores.

  • Login and study-scoped authorization paths will accept the same names.

If you maintain external validation or naming policy around study identifiers, update those checks to match the new rule before upgrading.

Site Log Configuration Restriction

On the current main branch, Session.configure_site_log and the corresponding nvflare system log-config path now accept only simple log levels and built-in log modes.

Impact:

  • JSON dictConfig payloads are no longer accepted for site-wide log changes.

  • File-path based logging configs are no longer accepted for site-wide log changes.

  • Supported values remain the standard log levels plus built-in modes such as concise, msg_only, full, verbose, and reload.

If you previously used advanced JSON/file-based configs with configure_site_log, switch to the supported level/mode values before upgrading to the next release built from main. For dict-based or file-path logging, use configure_job_log on a running job instead.

POC Start Default Service Clarification

On the current main branch, the documented default behavior of nvflare poc start is clarified to reflect the actual runtime behavior: the default start set is the server plus client services, not every participant directory under the workspace.

Impact:

  • Running nvflare poc start with no explicit -p / --service starts the server and clients.

  • Admin consoles are not started unless explicitly selected.

This is a documentation/help clarification, not a runtime behavior change.

Upgrading from 2.7.0/2.7.1 to 2.7.2

Recipe API Changes

initial_model renamed to model

The initial_model parameter in all recipes has been renamed to model for clarity:

# Before (2.7.0/2.7.1)
recipe = FedAvgRecipe(
    ...
    initial_model=SimpleNetwork(),
)

# After (2.7.2)
recipe = FedAvgRecipe(
    ...
    model=SimpleNetwork(),
)

The model parameter now also accepts dict-based configuration with optional pretrained checkpoint:

recipe = FedAvgRecipe(
    ...
    model={"path": "my_module.MyModel", "args": {"hidden_size": 256}},
    initial_ckpt="pretrained.pt",
)

PTFedAvgEarlyStopping merged into PTFedAvg

PTFedAvgEarlyStopping has been merged into PTFedAvg with InTime aggregation support. A backward-compatible alias is provided, but new code should use PTFedAvg:

# Before
from nvflare.app_opt.pt.fedavg_early_stopping import PTFedAvgEarlyStopping
controller = PTFedAvgEarlyStopping(...)

# After
from nvflare.app_opt.pt.fedavg import PTFedAvg
controller = PTFedAvg(...)

MONAI Integration

The separate nvflare-monai wheel package is deprecated. Use the Client API directly for MONAI integration. See the updated examples in examples/advanced/monai/ and the MONAI Migration Guide.

New Features (No Migration Required)

The following 2.7.2 features work automatically with no code changes:

  • TensorDownloader: Transparent memory optimization for PyTorch model weight transfer. See FLARE Tensor Downloader.

  • Server-side memory cleanup: Automatic garbage collection and heap trimming. See Memory Management.

Backward Compatibility

  • Job Config API: Existing FedJob-based configurations continue to work alongside the new Recipe API.

  • Config-based Jobs: JSON/YAML configuration-based jobs continue to work as before.

  • Executor/ModelLearner APIs: Still functional but no longer the recommended pattern. Use Recipe API + Client API for new projects.

For the full list of changes, see the What’s New in 2.7.2 release notes.

Upgrading from 2.5/2.6 to 2.7

FLARE 2.7.0 introduced several major changes:

  • Job Recipe API (technical preview): A higher-level API for creating FL jobs. See NVFlare Job Recipe.

  • Client API is now the recommended pattern for all new FL jobs.

  • Hierarchical FL: New relay-based communication hierarchy for large-scale deployments. See Hierarchical FLARE.

  • Edge & Mobile: Federated training on mobile devices (iOS/Android) with ExecuTorch. See Mobile Federated Training (iOS / Android).

  • File Streaming: Pull-based file download for large model transfers. See FLARE File Streaming.

For migrating from the older FLAdminAPI to the Client API, see Migrating to FLARE API.

For the full list of 2.7.0 changes, see What’s New in FLARE v2.7.0.