nvflare.app_common.executors.client_api_launcher_executor module
- class ClientAPILauncherExecutor(pipe_id: str, launcher_id: str | None = None, launch_timeout: float | None = None, task_wait_timeout: float | None = None, last_result_transfer_timeout: float = 300.0, external_pre_init_timeout: float = 300.0, peer_read_timeout: float | None = 300.0, monitor_interval: float = 0.01, read_interval: float = 0.5, heartbeat_interval: float = 5.0, heartbeat_timeout: float = 300.0, workers: int = 4, train_with_evaluation: bool = False, train_task_name: str = 'train', evaluate_task_name: str = 'validate', submit_model_task_name: str = 'submit_model', from_nvflare_converter_id: str | None = None, to_nvflare_converter_id: str | None = None, params_exchange_format: str = ExchangeFormat.NUMPY, params_transfer_type: str = TransferType.FULL, config_file_name: str = 'client_api_config.json', server_expected_format: str = ExchangeFormat.NUMPY, memory_gc_rounds: int = 0, cuda_empty_cache: bool = False, submit_result_timeout: float = 300.0, max_resends: int | None = 3, download_complete_timeout: float = 1800.0)[source]
Bases:
LauncherExecutorInitializes the ClientAPILauncherExecutor.
- Parameters:
pipe_id (str) – Identifier for obtaining the Pipe from NVFlare components.
launcher_id (Optional[str]) – Identifier for obtaining the Launcher from NVFlare components.
launch_timeout (Optional[float]) – Timeout for the Launcher’s “launch_task” method to complete (None for no timeout).
task_wait_timeout (Optional[float]) – Timeout for retrieving the task result (None for no timeout).
last_result_transfer_timeout (float) – Timeout for transmitting the last result from an external process. This value should be greater than the time needed for sending the whole result.
external_pre_init_timeout (float) – Time to wait for external process before it calls flare.init().
peer_read_timeout (float, optional) – time to wait for peer to accept sent message.
monitor_interval (float) – Interval for monitoring the launcher.
read_interval (float) – Interval for reading from the pipe.
heartbeat_interval (float) – Interval for sending heartbeat to the peer.
heartbeat_timeout (float) – Timeout for waiting for a heartbeat from the peer.
workers (int) – Number of worker threads needed.
train_with_evaluation (bool) – Whether to run training with global model evaluation.
train_task_name (str) – Task name of train mode.
evaluate_task_name (str) – Task name of evaluate mode.
submit_model_task_name (str) – Task name of submit_model mode.
from_nvflare_converter_id (Optional[str]) – Identifier used to get the ParamsConverter from NVFlare components. This ParamsConverter will be called when model is sent from nvflare controller side to executor side.
to_nvflare_converter_id (Optional[str]) – Identifier used to get the ParamsConverter from NVFlare components. This ParamsConverter will be called when model is sent from nvflare executor side to controller side.
server_expected_format (str) – What format to exchange the parameters between server and client.
params_exchange_format (str) – What format to exchange the parameters between client and script.
params_transfer_type (str) – How to transfer the parameters. FULL means the whole model parameters are sent. DIFF means that only the difference is sent.
config_file_name (str) – The config file name to write attributes into, the client api will read in this file.
submit_result_timeout (float) – How long (seconds) the subprocess waits for CJ to acknowledge each result pipe message. With reverse PASS_THROUGH enabled CJ ACKs immediately (LazyDownloadRef creation is microseconds), so 300 s is a very generous allowance. Without reverse PASS_THROUGH, CJ must download the full result before ACKing; in that case this should be at least as large as the expected transfer time. Recipe-based jobs can override via recipe.add_client_config({“submit_result_timeout”: N}).
max_resends (int) – Maximum number of retries after the initial result send if CJ does not ACK within submit_result_timeout. Defaults to 3. Set to a finite non-negative integer; 0 disables retries. None means unlimited retries (unsafe for large models because each retry creates a new download transaction) and is rejected at job initialization. Recipe-based jobs serialize this default in executor args; override per job via recipe.add_client_config({“max_resends”: N}).
download_complete_timeout (float) – How long (seconds) the subprocess waits after send_to_peer() ACKs for the server to finish downloading its tensors from the subprocess DownloadService. Without this gate, the subprocess may exit before the download completes and the server gets missing download refs. Defaults to 1800 s. Recipe-based jobs can override via recipe.add_client_config({“download_complete_timeout”: N}).
Checks output shareable after execute.
- Returns:
True, if output shareable looks good; False, otherwise.