dragon.infrastructure.gpu_desc

Module for detecting GPU devices across differet vendors

Functions

find_accelerators()

Scan for accelerators across all supported vendors

find_amd()

Return list of AMD GPUs returned by rocm-smi.

find_intel()

Return list of Intel GPUs returned by xpu-smi.

find_nvidia()

Return list of Nvidia GPUs returned by nvidia-smi.

Classes

AccEnvStr

AccVendor

AcceleratorDescriptor

AcceleratorDescriptor(vendor: dragon.infrastructure.gpu_desc.AccVendor = <AccVendor.UNKNOWN: 4>, device_list: list[int] = <factory>, env_str: str = '')

class AccVendor[source]

Bases: IntEnum

NVIDIA = 1
AMD = 2
INTEL = 3
UNKNOWN = 4
class AccEnvStr[source]

Bases: object

NVIDIA = 'CUDA_VISIBLE_DEVICES'
AMD = 'ROCR_VISIBLE_DEVICES'
HIP = 'HIP_VISIBLE_DEVICES'
INTEL = 'ZE_AFFINITY_MASK'
class AcceleratorDescriptor[source]

Bases: object

AcceleratorDescriptor(vendor: dragon.infrastructure.gpu_desc.AccVendor = <AccVendor.UNKNOWN: 4>, device_list: list[int] = <factory>, env_str: str = ‘’)

vendor: AccVendor = 4
device_list: list [int ]
env_str: str = ''
get_sdict()[source]
classmethod from_sdict(sdict)[source]
__init__(vendor: ~dragon.infrastructure.gpu_desc.AccVendor = AccVendor.UNKNOWN, device_list: list[int] = <factory>, env_str: str = '') None
find_nvidia() list [source]

Return list of Nvidia GPUs returned by nvidia-smi. Expected output from smi:

.
.
.
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-ccdb6af5-102b-3fb4-4e06-8b7aaeba0578)
GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-43539da6-e86a-d93e-8db2-f8814ef47c41)
.
.
.
Returns:

list of GPUs with IDs.

Return type:

list

find_amd() list [source]

Return list of AMD GPUs returned by rocm-smi. Expected output from smi:

.
.
.
card2,0x4eda2591da9a0592
card3,0x96ca52b5699c2baf
.
.
.
Returns:

a list of cards that can be iterated over

Return type:

list

find_intel() list [source]

Return list of Intel GPUs returned by xpu-smi. Expected output from smi:

.
.
.
+-----------+--------------------------------------------------------------------------------------+
| 2         | Device Name: Intel(R) Data Center GPU Max 1550                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-0000-0a2a-ca25127eb373                                       |
|           | PCI BDF Address: 0000:6c:00.0                                                        |
|           | DRM Device: /dev/dri/card2                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
| 3         | Device Name: Intel(R) Data Center GPU Max 1550                                       |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-0000-e986-d69bb5dc50cb                                       |
|           | PCI BDF Address: 0001:18:00.0                                                        |
|           | DRM Device: /dev/dri/card3                                                           |
|           | Function Type: physical                                                              |
.
.
.
Returns:

list of tuples with gpu device number and ID

Return type:

list

find_accelerators() AcceleratorDescriptor[source]

Scan for accelerators across all supported vendors