.. _migration_guide: ################ Migration Guide ################ This guide covers API and configuration changes when upgrading between FLARE releases. Upgrading from 2.7.2 to 2.8.0 ============================= Python and Removed Legacy Surfaces ---------------------------------- FLARE 2.8.0 targets Python 3.10 through 3.14. Python 3.9 is no longer listed as a supported development target. The deprecated FLAdminAPI surface has been removed. Use the FLARE API, Recipe API, Client API, and ``nvflare`` CLI workflows for new automation. HA/Overseer code has also been removed from the 2.8 branch. Client API Subprocess Timeout Validation ---------------------------------------- Subprocess-mode Client API jobs now validate two large-model safety settings at job initialization: - ``download_complete_timeout`` must not be ``None``. The subprocess must stay alive after ``send_to_peer()`` ACKs so the server can finish downloading tensors from the subprocess ``DownloadService``. - ``max_resends`` must not be ``None`` when using ``ClientAPILauncherExecutor``. Unlimited resends can turn one delayed large-model transfer into an unbounded series of replacement download transactions. If your 2.7.x job explicitly set either value to ``None``, update it before running on 2.8.0. Recipe-based external-process jobs already serialize the default ``max_resends=3`` in executor args, so the following setting is only needed when overriding a previous explicit ``None`` or choosing a different retry budget: .. code-block:: python recipe.add_client_config({ "download_complete_timeout": 1800, "max_resends": 3, # finite non-negative integer; 0 disables retries }) For large tensor or NumPy payloads, also keep the related streaming timeouts consistent. If you explicitly raise ``tensor_streaming_per_request_timeout`` or ``np_streaming_per_request_timeout``, set ``PEER_READ_TIMEOUT`` and ``download_complete_timeout`` to values at least as large as the configured streaming per-request timeout, and keep ``tensor_min_download_timeout`` or ``np_min_download_timeout`` at least as large as the same value. .. code-block:: python recipe.add_client_config({ "tensor_streaming_per_request_timeout": 600, "tensor_min_download_timeout": 600, "PEER_READ_TIMEOUT": 600, "download_complete_timeout": 1800, "max_resends": 3, }) Late Retry Handling for Finished Download Refs ---------------------------------------------- FLARE 2.8.0 makes finished ``DownloadService`` refs retry-safe for the same requester. If a client completed a large download but retries because the final EOF response was delayed, the server returns the same terminal status instead of ``INVALID_REQUEST`` / ``no ref found``. This is an internal reliability fix and does not require job-code changes, but it is most effective when the subprocess timeouts above are configured consistently for very large models. Upcoming Main-Branch Changes ============================ FLARE API Compatibility Note ---------------------------- On the current ``main`` branch, :class:`NoConnection` now subclasses Python's built-in ``ConnectionError`` instead of directly subclassing ``Exception``. Impact: - Existing code that catches ``ConnectionError`` will now also catch ``NoConnection``. - Existing code that catches ``NoConnection`` continues to work unchanged. If your application distinguishes FLARE connection failures from broader OS or network exceptions, review any broad ``except ConnectionError:`` handlers before upgrading to the next release built from ``main``. FLARE API Lifecycle Restriction ------------------------------- On the current ``main`` branch, :meth:`Session.shutdown` and :meth:`Session.restart` are now restricted to ``TargetType.SERVER`` only. Impact: - Existing callers that pass ``TargetType.ALL`` or ``TargetType.CLIENT`` will now fail. - Server-scoped lifecycle control continues to work unchanged. - :meth:`Session.shutdown_system` is unchanged and still supports whole-system shutdown. For whole local PoC lifecycle control, use the PoC start/stop flow instead of the general system admin API. CLI Startup Kit Resolution Change --------------------------------- On the current ``main`` branch, server-connected CLI commands use a shared active startup kit registry in ``~/.nvflare/config.conf``. Impact: - Use ``nvflare config add `` and ``nvflare config use `` to register and activate a startup kit. - ``nvflare config -d/--startup_kit_dir`` remains accepted for compatibility with 2.7.x scripts, but is deprecated. - ``NVFLARE_STARTUP_KIT_DIR`` remains an automation override and takes precedence over the active registry entry when set. - ``nvflare config -jt/--job_templates_dir`` remains accepted for compatibility with 2.7.x scripts, but job template config is deprecated. - Root ``nvflare config`` continues to manage local settings such as the POC workspace. Startup kit paths are managed by the ``nvflare config`` subcommands. If you use shell profiles or CI settings that export ``NVFLARE_STARTUP_KIT_DIR``, review them before upgrading because they override the active registry entry. CLI Config Flag Compatibility ----------------------------- On the current ``main`` branch, ``nvflare config`` keeps the 2.7.x POC workspace flag names. Impact: - ``-pw`` and ``--poc_workspace_dir`` remain the supported flags for setting the POC workspace. - The interim development-only ``--poc.workspace`` spelling is not part of the public compatibility contract. If you have older scripts that use ``-pw`` or ``--poc_workspace_dir``, they continue to work. Client Disable Semantics ------------------------ ``nvflare system remove-client`` is not exposed as a supported public CLI command. The legacy interactive-console ``remove_client`` command is hidden from normal help and remains a registry cleanup operation only: it releases the active token so the client can register again. It does not stop the client process, revoke credentials, or prevent reconnect. Use the new durable access-control commands when the intent is to keep a client out of the federation: - ``nvflare system disable-client --force`` persists a disabled flag in the server workspace, removes any active registry entry, and rejects later registration or heartbeat from that client. - ``nvflare system enable-client --force`` clears the disabled flag so the client can rejoin on the next registration or heartbeat. This is operational disablement, not certificate revocation. Study Name Validation Relaxation -------------------------------- On the current ``main`` branch, study names now allow underscores in internal positions, so names such as ``my_study`` are valid. Impact: - ``project.yml`` validation now accepts study names with internal underscores. - Login and study-scoped authorization paths will accept the same names. If you maintain external validation or naming policy around study identifiers, update those checks to match the new rule before upgrading. Site Log Configuration Restriction ---------------------------------- On the current ``main`` branch, :meth:`Session.configure_site_log` and the corresponding ``nvflare system log-config`` path now accept only simple log levels and built-in log modes. Impact: - JSON ``dictConfig`` payloads are no longer accepted for site-wide log changes. - File-path based logging configs are no longer accepted for site-wide log changes. - Supported values remain the standard log levels plus built-in modes such as ``concise``, ``msg_only``, ``full``, ``verbose``, and ``reload``. If you previously used advanced JSON/file-based configs with ``configure_site_log``, switch to the supported level/mode values before upgrading to the next release built from ``main``. For dict-based or file-path logging, use ``configure_job_log`` on a running job instead. POC Start Default Service Clarification --------------------------------------- On the current ``main`` branch, the documented default behavior of ``nvflare poc start`` is clarified to reflect the actual runtime behavior: the default start set is the server plus client services, not every participant directory under the workspace. Impact: - Running ``nvflare poc start`` with no explicit ``-p`` / ``--service`` starts the server and clients. - Admin consoles are not started unless explicitly selected. This is a documentation/help clarification, not a runtime behavior change. Upgrading from 2.7.0/2.7.1 to 2.7.2 ====================================== Recipe API Changes ------------------ **initial_model renamed to model** The ``initial_model`` parameter in all recipes has been renamed to ``model`` for clarity: .. code-block:: python # Before (2.7.0/2.7.1) recipe = FedAvgRecipe( ... initial_model=SimpleNetwork(), ) # After (2.7.2) recipe = FedAvgRecipe( ... model=SimpleNetwork(), ) The ``model`` parameter now also accepts dict-based configuration with optional pretrained checkpoint: .. code-block:: python recipe = FedAvgRecipe( ... model={"path": "my_module.MyModel", "args": {"hidden_size": 256}}, initial_ckpt="pretrained.pt", ) **PTFedAvgEarlyStopping merged into PTFedAvg** ``PTFedAvgEarlyStopping`` has been merged into ``PTFedAvg`` with InTime aggregation support. A backward-compatible alias is provided, but new code should use ``PTFedAvg``: .. code-block:: python # Before from nvflare.app_opt.pt.fedavg_early_stopping import PTFedAvgEarlyStopping controller = PTFedAvgEarlyStopping(...) # After from nvflare.app_opt.pt.fedavg import PTFedAvg controller = PTFedAvg(...) MONAI Integration ------------------ The separate ``nvflare-monai`` wheel package is deprecated. Use the Client API directly for MONAI integration. See the updated examples in ``examples/advanced/monai/`` and the `MONAI Migration Guide `_. New Features (No Migration Required) -------------------------------------- The following 2.7.2 features work automatically with no code changes: - **TensorDownloader**: Transparent memory optimization for PyTorch model weight transfer. See :ref:`tensor_downloader`. - **Server-side memory cleanup**: Automatic garbage collection and heap trimming. See :doc:`/programming_guide/memory_management`. Backward Compatibility ----------------------- - **Job Config API**: Existing ``FedJob``-based configurations continue to work alongside the new Recipe API. - **Config-based Jobs**: JSON/YAML configuration-based jobs continue to work as before. - **Executor/ModelLearner APIs**: Still functional but no longer the recommended pattern. Use Recipe API + Client API for new projects. For the full list of changes, see the :doc:`What's New in 2.7.2 ` release notes. Upgrading from 2.5/2.6 to 2.7 ================================ FLARE 2.7.0 introduced several major changes: - **Job Recipe API** (technical preview): A higher-level API for creating FL jobs. See :ref:`job_recipe`. - **Client API** is now the recommended pattern for all new FL jobs. - **Hierarchical FL**: New relay-based communication hierarchy for large-scale deployments. See :ref:`flare_hierarchical_architecture`. - **Edge & Mobile**: Federated training on mobile devices (iOS/Android) with ExecuTorch. See :ref:`mobile_training`. - **File Streaming**: Pull-based file download for large model transfers. See :ref:`file_streaming`. For migrating from the older FLAdminAPI to the Client API, see :doc:`Migrating to FLARE API `. For the full list of 2.7.0 changes, see :doc:`/release_notes/flare_270`.