nvflare.app_opt.xgboost.histogram_based_v2.controller module

class ClientStatus[source]

Bases: object

Objects of this class keep processing status of each FL client during job execution.

class XGBController(adaptor_component_id: str, num_rounds: int, data_split_mode: int, secure_training: bool, xgb_params: dict, xgb_options: dict | None = None, disable_version_check=False, configure_task_name='config', configure_task_timeout=60, start_task_name='start', start_task_timeout=10, job_status_check_interval: float = 2.0, max_client_op_interval: float = 600.0, progress_timeout: float = 3600.0, client_ranks=None)[source]

Bases: Controller

Constructor

For the meaning of XGBoost parameters, please refer to the documentation for train API, https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.train

Parameters:
  • adaptor (adaptor_component_id - the component ID of server target)

  • rounds (num_rounds - number of)

  • horizontal/row-split (data_split_mode - 0 for)

  • vertical/column-split (1 for)

  • true (disable_version_check - If)

  • enabled (secure training is)

  • method (xgb_params - The params argument for train)

  • dictionary (xgb_options - All other arguments for train method are passed through this)

  • true

  • skipped (XGBoost version check for secure training is)

  • task (start_task_name - name of the start)

  • timeout. (start_task_timeout - time to wait for clients’ responses to the start task before)

  • task

  • timeout.

  • job (job_status_check_interval - how often to check client statuses of the)

  • client (max_client_op_interval - max amount of time allowed between XGB ops from a)

  • progress. (progress_timeout- the maximum amount of time allowed for the workflow to not make any) – In other words, at least one participating client must have made progress during this time. Otherwise, the workflow will be considered to be in trouble and the job will be aborted.

  • client_ranks – client rank assignments. If specified, must be a dict of client_name => rank. If not specified, client ranks will be randomly assigned. No matter how assigned, ranks must be consecutive integers, starting from 0.

control_flow(abort_signal: Signal, fl_ctx: FLContext)[source]

This is the control flow of the XGB Controller. To ensure smooth XGB execution: - ensure that all clients are online and ready to go before starting server - ensure that server is started and ready to take requests before asking clients to start operation - monitor the health of the clients - if anything goes wrong, terminate the job

Parameters:
  • abort_signal – abort signal that is used to notify components to abort

  • fl_ctx – FL context

Returns: None

get_adaptor(fl_ctx: FLContext)[source]
handle_event(event_type: str, fl_ctx: FLContext)[source]

Handles events.

Parameters:
  • event_type (str) – event type fired by workflow.

  • fl_ctx (FLContext) – FLContext information.

process_result_of_unknown_task(client: Client, task_name: str, client_task_id: str, result: Shareable, fl_ctx: FLContext)[source]

Process result when no task is found for it.

This is called when a result submission is received from a client, but no standing task can be found for it (from the task queue)

This could happen when: - the client’s submission is too late - the task is already completed - the Controller lost the task, e.g. the Server is restarted

Parameters:
  • client – the client that the result comes from

  • task_name – the name of the task

  • client_task_id – ID of the task

  • result – the result from the client

  • fl_ctx – the FL context that comes with the client’s submission

start_controller(fl_ctx: FLContext)[source]

Starts the controller.

This method is called at the beginning of the RUN.

Parameters:
  • fl_ctx – the FL context. You can use this context to access services provided by the

  • example (framework. For)

  • your (you can get Command Register from it and register)

  • modules. (admin command)

stop_controller(fl_ctx: FLContext)[source]

Stops the controller.

This method is called right before the RUN is ended.

Parameters:
  • fl_ctx – the FL context. You can use this context to access services provided by the

  • example (framework. For)

  • your (you can get Command Register from it and unregister)

  • modules. (admin command)