Controlling GPU Affinity

This tutorial will draw from the Policy documentation to walk through how to use ProcessGroup to set what GPUs to use for a given function or process as well as how to do it with Pool.

Policies via the System API

The simplest way to create a list of Policies for each GPU on every node Dragon is running on is with the gpu_policies() method. In the example below, we’ll apply each Policy to 4 processes in a Pool.

Listing 46 Create a list of Policies specifying GPU affinity
 1from dragon.native.machine import System
 2from dragon.native.pool import Pool
 3
 4
 5def gpu_work(item):
 6    # GPU processing code, such as PyTorch or CuPy
 7
 8gpu_policies = System().gpu_policies()
 9nworkers = 4 * len(gpu_policies)
10p = Pool(policy=policies, processes_per_policy=4)
11
12results = p.map_async(gpu_work, range(100)).get()
13print(f"{nworkers} workers say: {results}", flush=True)
14
15p.close()
16p.join()
17return results

Manually Derived Policies

To leverage full control over generating policies and applying them to processes, the first place to start is to ask Dragon about the infrastructure it is running on and what GPUs are available. The most direct way to access this information is through System and Node. The example below shows how to use System and Node to manually inspect each node that Dragon is running on and create a list of hostnames and GPU IDs. You can of course add any other logic you like to focus on just a subset of the nodes or filter by other criteria about a node.

Listing 47 Scan all nodes for GPUs and create a list of tuples containing hostname and GPU ID
 1from dragon.native.machine import System, Node
 2
 3
 4def find_gpus():
 5
 6    all_gpus = []
 7    # loop through all nodes Dragon is running on
 8    for huid in System().nodes:
 9        node = Node(huid)
10        # loop through however many GPUs it may have
11        for gpu_id in node.gpus:
12            all_gpus.append((node.hostname, gpu_id))
13    return all_gpus

Next we’ll use that list of tuples to create a list of Policies where each Policy specifies a host and a GPU on that host.

Listing 48 Given a list of (hostname, gpu_id), create a list of Policies specifying GPU affinity
 1from dragon.infrastructure.policy import Policy
 2
 3
 4# pass in the output from find_gpus() above
 5def make_policies(all_gpus=None, nprocs=32):
 6
 7    # loop over each desired Policy
 8    # the number of which will be the number of processes we'll launch with ProcessGroup
 9    policies = []
10    i = 0
11    for worker in range(nprocs):
12        # assign them in a round robin fashion
13        policies.append(Policy(placement=Policy.Placement.HOST_NAME,
14                               host_name=all_gpus[i][0],
15                               gpu_affinity=[all_gpus[i][1]]))
16        i += 1
17        if i == len(all_gpus):
18            i = 0
19    return policies

Test It Out with Native Pool

Now that we can build a list of Policies for our processes, let’s try it out using Pool. In the example below, each worker will first say what host and GPU it will use to verify its dragon.infrastructure.policy.Policy is working as intended. Then we’ll use PyTorch to do some computation on the specified GPU.

Listing 49 Run a native Pool where workers are assigned a GPU to use
 1import os
 2import torch
 3import numpy as np
 4
 5from dragon.native.machine import current
 6from dragon.native import Pool
 7
 8
 9# reuse find_gpus() and make_policies() from above
10
11# GPU affinity is specified to the process by Dragon using the relevant method/environment variable,
12# such as CUDA_VISIBLE_DEVICES for NVIDIA devices (AMD and Intel also supported, see dragon.infrastructure.gpu_desc)
13# we'll assume NVIDIA GPUs for this example and verify CUDA_VISIBLE_DEVICES
14def my_gpu():
15    mynode = current()
16    print(f"Hello!, I have GPU={os.getenv('CUDA_VISIBLE_DEVICES')} on host={mynode.hostname}", flush=True)
17
18
19# do some matrix multiplication
20def gpu_work(x):
21    v = np.array(512*[x*1.0])
22    nx = 16
23    ny = 512 // 16
24    a = v.reshape(ny, nx)
25    b = v.reshape(nx, ny)
26    tensor_a = torch.from_numpy(a).cuda()
27    tensor_b = torch.from_numpy(b).cuda()
28    output = torch.sum(torch.matmul(tensor_a, tensor_b)).cpu().item()
29
30    del tensor_a, tensor_b
31    torch.cuda.empty_cache()
32    return output
33
34
35# run a native Pool with the given number of workers, each assinged a single GPU
36def gpu_pool(nprocs=32):
37    all_gpus = find_gpus()
38    policies = make_policies(all_gpus=all_gpus, nprocs=nprocs)
39
40    # light up as many as nprocs worth of GPUs!
41    p = Pool(policy=policies, processes_per_policy=1, initializer=my_gpu)
42    results = p.map_async(gpu_work, range(32)).get()
43    p.close()
44    p.join()
45    return results
46
47
48if __name__ == '__main__':
49    gpu_pool()

Running this example on a 4 nodes, each equipped with 4 NVIDIA A100 GPUs, gives us:

$ pip install torch numpy
$ dragon gpu_pool.py
Hello!, I have GPU=0 on host=pinoak0039
Hello!, I have GPU=1 on host=pinoak0039
Hello!, I have GPU=1 on host=pinoak0035
Hello!, I have GPU=0 on host=pinoak0034
Hello!, I have GPU=0 on host=pinoak0036
Hello!, I have GPU=2 on host=pinoak0039
Hello!, I have GPU=2 on host=pinoak0035
Hello!, I have GPU=1 on host=pinoak0034
Hello!, I have GPU=3 on host=pinoak0039
Hello!, I have GPU=3 on host=pinoak0036
Hello!, I have GPU=3 on host=pinoak0035
Hello!, I have GPU=2 on host=pinoak0039
Hello!, I have GPU=1 on host=pinoak0034
Hello!, I have GPU=2 on host=pinoak0036
Hello!, I have GPU=0 on host=pinoak0035
Hello!, I have GPU=1 on host=pinoak0036
Hello!, I have GPU=2 on host=pinoak0035
Hello!, I have GPU=0 on host=pinoak0039
Hello!, I have GPU=2 on host=pinoak0034
Hello!, I have GPU=1 on host=pinoak0035
Hello!, I have GPU=1 on host=pinoak0036
Hello!, I have GPU=2 on host=pinoak0034
Hello!, I have GPU=3 on host=pinoak0039
Hello!, I have GPU=0 on host=pinoak0036
Hello!, I have GPU=0 on host=pinoak0035
Hello!, I have GPU=1 on host=pinoak0039
Hello!, I have GPU=0 on host=pinoak0034
Hello!, I have GPU=2 on host=pinoak0036
Hello!, I have GPU=3 on host=pinoak0034
Hello!, I have GPU=3 on host=pinoak0035
Hello!, I have GPU=3 on host=pinoak0034
Hello!, I have GPU=3 on host=pinoak0036

Test It Out with ProcessGroup

Next we’ll adapt some of the code above to run with ProcessGroup, where we’ll have a little more control over what the processes do. We’ll still run a Python function in this example, but you could instead run serial executables or even MPI processes this way (see Orchestrate Processes and Orchestrate MPI Applications).

Listing 50 Run a ProcessGroup where each process is assigned a single GPU
 1import os
 2import torch
 3import numpy as np
 4
 5from dragon.native.machine import current
 6from dragon.native.process_group import ProcessGroup, ProcessTemplate
 7
 8
 9# reuse find_gpus() and make_policies() from above
10
11# GPU affinity is specified to the process by Dragon using the relevant method/environment variable,
12# such as CUDA_VISIBLE_DEVICES for NVIDIA devices (AMD and Intel also supported, see dragon.infrastructure.gpu_desc)
13# we'll assume NVIDIA GPUs for this example and verify CUDA_VISIBLE_DEVICES
14def my_gpu(id, x=512):
15    mynode = current()
16    print(f"ID {id} has GPU={os.getenv('CUDA_VISIBLE_DEVICES')} on host={mynode.hostname}", flush=True)
17
18    # reuse the definition of gpu_work() from above
19    gpu_work(x)
20
21
22def gpu_pg(nprocs=32):
23    all_gpus = find_gpus()
24    policies = make_policies(all_gpus, nprocs=nprocs)
25
26    # light up as many as nprocs worth of GPUs!
27    pg = ProcessGroup()
28    for i in range(nprocs):
29        pg.add_process(nproc=1, template=ProcessTemplate(target=my_gpu, args=(i, i,), policy=policies[i]))
30
31    pg.init()
32    pg.start()
33    pg.join()
34    pg.close()
35
36
37if __name__ == '__main__':
38    gpu_pg()

Running this example on a 4 nodes, each equipped with 4 NVIDIA A100 GPUs, gives us:

$ pip install torch numpy
$ dragon gpu_process_group.py
ID 18 has GPU=2 on host=pinoak0039
ID 2 has GPU=2 on host=pinoak0039
ID 0 has GPU=0 on host=pinoak0039
ID 19 has GPU=3 on host=pinoak0039
ID 17 has GPU=1 on host=pinoak0039
ID 16 has GPU=0 on host=pinoak0039
ID 3 has GPU=3 on host=pinoak0039
ID 1 has GPU=1 on host=pinoak0039
ID 30 has GPU=2 on host=pinoak0036
ID 29 has GPU=1 on host=pinoak0036
ID 28 has GPU=0 on host=pinoak0036
ID 12 has GPU=0 on host=pinoak0036
ID 13 has GPU=1 on host=pinoak0036
ID 25 has GPU=1 on host=pinoak0034
ID 15 has GPU=3 on host=pinoak0036
ID 6 has GPU=2 on host=pinoak0035
ID 22 has GPU=2 on host=pinoak0035
ID 4 has GPU=0 on host=pinoak0035
ID 14 has GPU=2 on host=pinoak0036
ID 24 has GPU=0 on host=pinoak0034
ID 20 has GPU=0 on host=pinoak0035
ID 5 has GPU=1 on host=pinoak0035
ID 9 has GPU=1 on host=pinoak0034
ID 27 has GPU=3 on host=pinoak0034
ID 31 has GPU=3 on host=pinoak0036
ID 8 has GPU=0 on host=pinoak0034
ID 21 has GPU=1 on host=pinoak0035
ID 23 has GPU=3 on host=pinoak0035
ID 7 has GPU=3 on host=pinoak0035
ID 11 has GPU=3 on host=pinoak0034
ID 10 has GPU=2 on host=pinoak0034
ID 26 has GPU=2 on host=pinoak0034