.. _migration_guide:

################
Migration Guide
################

This guide covers API and configuration changes when upgrading between FLARE releases.

Upgrading from 2.7.2 to 2.8.0
=============================

Python and Removed Legacy Surfaces
----------------------------------

FLARE 2.8.0 targets Python 3.10 through 3.14. Python 3.9 is no longer listed as
a supported development target.

The deprecated FLAdminAPI surface has been removed. Use the FLARE API, Recipe
API, Client API, and ``nvflare`` CLI workflows for new automation.

HA/Overseer code has also been removed from the 2.8 branch.

Client API Subprocess Timeout Validation
----------------------------------------

Subprocess-mode Client API jobs now validate two large-model safety settings at
job initialization:

- ``download_complete_timeout`` must not be ``None``. The subprocess must stay
  alive after ``send_to_peer()`` ACKs so the server can finish downloading
  tensors from the subprocess ``DownloadService``.
- ``max_resends`` must not be ``None`` when using ``ClientAPILauncherExecutor``.
  Unlimited resends can turn one delayed large-model transfer into an unbounded
  series of replacement download transactions.

If your 2.7.x job explicitly set either value to ``None``, update it before
running on 2.8.0. Recipe-based external-process jobs already serialize the
default ``max_resends=3`` in executor args, so the following setting is only
needed when overriding a previous explicit ``None`` or choosing a different
retry budget:

.. code-block:: python

   recipe.add_client_config({
       "download_complete_timeout": 1800,
       "max_resends": 3,  # finite non-negative integer; 0 disables retries
   })

For large tensor or NumPy payloads, also keep the related streaming timeouts
consistent. If you explicitly raise ``tensor_streaming_per_request_timeout`` or
``np_streaming_per_request_timeout``, set ``PEER_READ_TIMEOUT`` and
``download_complete_timeout`` to values at least as large as the configured
streaming per-request timeout, and keep ``tensor_min_download_timeout`` or
``np_min_download_timeout`` at least as large as the same value.

.. code-block:: python

   recipe.add_client_config({
       "tensor_streaming_per_request_timeout": 600,
       "tensor_min_download_timeout": 600,
       "PEER_READ_TIMEOUT": 600,
       "download_complete_timeout": 1800,
       "max_resends": 3,
   })

Late Retry Handling for Finished Download Refs
----------------------------------------------

FLARE 2.8.0 makes finished ``DownloadService`` refs retry-safe for the same
requester. If a client completed a large download but retries because the final
EOF response was delayed, the server returns the same terminal status instead of
``INVALID_REQUEST`` / ``no ref found``. This is an internal reliability fix and
does not require job-code changes, but it is most effective when the subprocess
timeouts above are configured consistently for very large models.

Upcoming Main-Branch Changes
============================

FLARE API Compatibility Note
----------------------------

On the current ``main`` branch, :class:`NoConnection<nvflare.fuel.flare_api.api_spec.NoConnection>`
now subclasses Python's built-in ``ConnectionError`` instead of directly subclassing
``Exception``.

Impact:

- Existing code that catches ``ConnectionError`` will now also catch
  ``NoConnection``.
- Existing code that catches ``NoConnection`` continues to work unchanged.

If your application distinguishes FLARE connection failures from broader OS or
network exceptions, review any broad ``except ConnectionError:`` handlers before
upgrading to the next release built from ``main``.

FLARE API Lifecycle Restriction
-------------------------------

On the current ``main`` branch, :meth:`Session.shutdown<nvflare.fuel.flare_api.api_spec.SessionSpec.shutdown>`
and :meth:`Session.restart<nvflare.fuel.flare_api.api_spec.SessionSpec.restart>`
are now restricted to ``TargetType.SERVER`` only.

Impact:

- Existing callers that pass ``TargetType.ALL`` or ``TargetType.CLIENT`` will now fail.
- Server-scoped lifecycle control continues to work unchanged.
- :meth:`Session.shutdown_system<nvflare.fuel.flare_api.api_spec.SessionSpec.shutdown_system>`
  is unchanged and still supports whole-system shutdown.

For whole local PoC lifecycle control, use the PoC start/stop flow instead of
the general system admin API.

CLI Startup Kit Resolution Change
---------------------------------

On the current ``main`` branch, server-connected CLI commands use a shared
active startup kit registry in ``~/.nvflare/config.conf``.

Impact:

- Use ``nvflare config add <id> <startup-kit-dir>`` and
  ``nvflare config use <id>`` to register and activate a startup kit.
- ``nvflare config -d/--startup_kit_dir`` remains accepted for compatibility
  with 2.7.x scripts, but is deprecated.
- ``NVFLARE_STARTUP_KIT_DIR`` remains an automation override and takes
  precedence over the active registry entry when set.
- ``nvflare config -jt/--job_templates_dir`` remains accepted for compatibility
  with 2.7.x scripts, but job template config is deprecated.
- Root ``nvflare config`` continues to manage local settings such as the POC
  workspace. Startup kit paths are managed by the ``nvflare config``
  subcommands.

If you use shell profiles or CI settings that export ``NVFLARE_STARTUP_KIT_DIR``,
review them before upgrading because they override the active registry entry.

CLI Config Flag Compatibility
-----------------------------

On the current ``main`` branch, ``nvflare config`` keeps the 2.7.x POC
workspace flag names.

Impact:

- ``-pw`` and ``--poc_workspace_dir`` remain the supported flags for setting
  the POC workspace.
- The interim development-only ``--poc.workspace`` spelling is not part of the
  public compatibility contract.

If you have older scripts that use ``-pw`` or ``--poc_workspace_dir``, they
continue to work.

Client Disable Semantics
------------------------

``nvflare system remove-client`` is not exposed as a supported public CLI
command. The legacy interactive-console ``remove_client`` command is hidden
from normal help and remains a registry cleanup operation only: it releases
the active token so the client can register again. It does not stop the client
process, revoke credentials, or prevent reconnect.

Use the new durable access-control commands when the intent is to keep a client
out of the federation:

- ``nvflare system disable-client <client> --force`` persists a disabled flag
  in the server workspace, removes any active registry entry, and rejects
  later registration or heartbeat from that client.
- ``nvflare system enable-client <client> --force`` clears the disabled flag so
  the client can rejoin on the next registration or heartbeat.

This is operational disablement, not certificate revocation.

Study Name Validation Relaxation
--------------------------------

On the current ``main`` branch, study names now allow underscores in internal
positions, so names such as ``my_study`` are valid.

Impact:

- ``project.yml`` validation now accepts study names with internal underscores.
- Login and study-scoped authorization paths will accept the same names.

If you maintain external validation or naming policy around study identifiers,
update those checks to match the new rule before upgrading.

Site Log Configuration Restriction
----------------------------------

On the current ``main`` branch, :meth:`Session.configure_site_log<nvflare.fuel.flare_api.api_spec.SessionSpec.configure_site_log>`
and the corresponding ``nvflare system log-config`` path now accept only simple
log levels and built-in log modes.

Impact:

- JSON ``dictConfig`` payloads are no longer accepted for site-wide log changes.
- File-path based logging configs are no longer accepted for site-wide log changes.
- Supported values remain the standard log levels plus built-in modes such as
  ``concise``, ``msg_only``, ``full``, ``verbose``, and ``reload``.

If you previously used advanced JSON/file-based configs with
``configure_site_log``, switch to the supported level/mode values before
upgrading to the next release built from ``main``.
For dict-based or file-path logging, use ``configure_job_log`` on a running job instead.

POC Start Default Service Clarification
---------------------------------------

On the current ``main`` branch, the documented default behavior of
``nvflare poc start`` is clarified to reflect the actual runtime behavior:
the default start set is the server plus client services, not every
participant directory under the workspace.

Impact:

- Running ``nvflare poc start`` with no explicit ``-p`` / ``--service`` starts
  the server and clients.
- Admin consoles are not started unless explicitly selected.

This is a documentation/help clarification, not a runtime behavior change.

Upgrading from 2.7.0/2.7.1 to 2.7.2
======================================

Recipe API Changes
------------------

**initial_model renamed to model**

The ``initial_model`` parameter in all recipes has been renamed to ``model`` for clarity:

.. code-block:: python

    # Before (2.7.0/2.7.1)
    recipe = FedAvgRecipe(
        ...
        initial_model=SimpleNetwork(),
    )

    # After (2.7.2)
    recipe = FedAvgRecipe(
        ...
        model=SimpleNetwork(),
    )

The ``model`` parameter now also accepts dict-based configuration with optional pretrained checkpoint:

.. code-block:: python

    recipe = FedAvgRecipe(
        ...
        model={"path": "my_module.MyModel", "args": {"hidden_size": 256}},
        initial_ckpt="pretrained.pt",
    )

**PTFedAvgEarlyStopping merged into PTFedAvg**

``PTFedAvgEarlyStopping`` has been merged into ``PTFedAvg`` with InTime aggregation support.
A backward-compatible alias is provided, but new code should use ``PTFedAvg``:

.. code-block:: python

    # Before
    from nvflare.app_opt.pt.fedavg_early_stopping import PTFedAvgEarlyStopping
    controller = PTFedAvgEarlyStopping(...)

    # After
    from nvflare.app_opt.pt.fedavg import PTFedAvg
    controller = PTFedAvg(...)

MONAI Integration
------------------

The separate ``nvflare-monai`` wheel package is deprecated. Use the Client API directly
for MONAI integration. See the updated examples in ``examples/advanced/monai/`` and the
`MONAI Migration Guide <https://github.com/NVIDIA/NVFlare/blob/main/integration/monai/MIGRATION.md>`_.

New Features (No Migration Required)
--------------------------------------

The following 2.7.2 features work automatically with no code changes:

- **TensorDownloader**: Transparent memory optimization for PyTorch model weight transfer.
  See :ref:`tensor_downloader`.
- **Server-side memory cleanup**: Automatic garbage collection and heap trimming.
  See :doc:`/programming_guide/memory_management`.

Backward Compatibility
-----------------------

- **Job Config API**: Existing ``FedJob``-based configurations continue to work alongside the new Recipe API.
- **Config-based Jobs**: JSON/YAML configuration-based jobs continue to work as before.
- **Executor/ModelLearner APIs**: Still functional but no longer the recommended pattern. Use Recipe API + Client API for new projects.

For the full list of changes, see the :doc:`What's New in 2.7.2 </release_notes/flare_272>` release notes.

Upgrading from 2.5/2.6 to 2.7
================================

FLARE 2.7.0 introduced several major changes:

- **Job Recipe API** (technical preview): A higher-level API for creating FL jobs. See :ref:`job_recipe`.
- **Client API** is now the recommended pattern for all new FL jobs.
- **Hierarchical FL**: New relay-based communication hierarchy for large-scale deployments.
  See :ref:`flare_hierarchical_architecture`.
- **Edge & Mobile**: Federated training on mobile devices (iOS/Android) with ExecuTorch.
  See :ref:`mobile_training`.
- **File Streaming**: Pull-based file download for large model transfers.
  See :ref:`file_streaming`.

For migrating from the older FLAdminAPI to the Client API, see :doc:`Migrating to FLARE API </programming_guide/migrating_to_flare_api>`.

For the full list of 2.7.0 changes, see :doc:`/release_notes/flare_270`.