nvflare.app_common.resource_managers.gpu_resource_manager module
- class GPUResourceManager(num_of_gpus: int, mem_per_gpu_in_GiB: int | float, num_gpu_key: str = 'num_of_gpus', gpu_mem_key: str = 'mem_per_gpu_in_GiB', expiration_period: int | float = 30, ignore_host: bool = False)[source]
Bases:
AutoCleanResourceManagerResource manager for GPUs.
- Parameters:
num_of_gpus – Number of GPUs.
mem_per_gpu_in_GiB – Memory for each GPU.
num_gpu_key – The key in resource requirements that specify the number of GPUs.
gpu_mem_key – The key in resource requirements that specify the memory per GPU.
expiration_period – Number of seconds to hold the resources reserved. If check_resources is called but after “expiration_period” no allocate resource is called, then the reserved resources will be released.
ignore_host – Whether to skip validation against GPUs present on the local host. Set to True in environments where the NVFlare process runs on a node without GPUs (for example, some Kubernetes deployments) but GPU resources are managed externally.