dragon.infrastructure.gpu_desc
Module for detecting GPU devices across differet vendors
Functions
Scan for accelerators across all supported vendors |
|
|
Return list of AMD GPUs returned by rocm-smi. |
Return list of Intel GPUs returned by xpu-smi. |
|
Return list of Nvidia GPUs returned by nvidia-smi. |
Classes
AcceleratorDescriptor(vendor: dragon.infrastructure.gpu_desc.AccVendor = <AccVendor.UNKNOWN: 4>, device_list: list[int] = <factory>, env_str: str = '') |
- class AccEnvStr[source]
Bases:
object
- NVIDIA = 'CUDA_VISIBLE_DEVICES'
- AMD = 'ROCR_VISIBLE_DEVICES'
- HIP = 'HIP_VISIBLE_DEVICES'
- INTEL = 'ZE_AFFINITY_MASK'
- class AcceleratorDescriptor[source]
Bases:
object
AcceleratorDescriptor(vendor: dragon.infrastructure.gpu_desc.AccVendor = <AccVendor.UNKNOWN: 4>, device_list: list[int] = <factory>, env_str: str = ‘’)
- find_nvidia() list [source]
Return list of Nvidia GPUs returned by nvidia-smi. Expected output from smi:
. . . GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-ccdb6af5-102b-3fb4-4e06-8b7aaeba0578) GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-43539da6-e86a-d93e-8db2-f8814ef47c41) . . .
- Returns:
list of GPUs with IDs.
- Return type:
- find_amd() list [source]
Return list of AMD GPUs returned by rocm-smi. Expected output from smi:
. . . card2,0x4eda2591da9a0592 card3,0x96ca52b5699c2baf . . .
- Returns:
a list of cards that can be iterated over
- Return type:
- find_intel() list [source]
Return list of Intel GPUs returned by xpu-smi. Expected output from smi:
. . . +-----------+--------------------------------------------------------------------------------------+ | 2 | Device Name: Intel(R) Data Center GPU Max 1550 | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0000-0a2a-ca25127eb373 | | | PCI BDF Address: 0000:6c:00.0 | | | DRM Device: /dev/dri/card2 | | | Function Type: physical | +-----------+--------------------------------------------------------------------------------------+ | 3 | Device Name: Intel(R) Data Center GPU Max 1550 | | | Vendor Name: Intel(R) Corporation | | | SOC UUID: 00000000-0000-0000-e986-d69bb5dc50cb | | | PCI BDF Address: 0001:18:00.0 | | | DRM Device: /dev/dri/card3 | | | Function Type: physical | . . .
- Returns:
list of tuples with gpu device number and ID
- Return type:
- find_accelerators() AcceleratorDescriptor [source]
Scan for accelerators across all supported vendors