Controlling GPU Affinity
This tutorial will draw from the Policy
documentation to walk through how
to use ProcessGroup
to set what GPUs to use for a given function or process as
well as how to do it with Pool
.
Policies via the System API
The simplest way to create a list of Policies
for each GPU on every
node Dragon is running on is with the gpu_policies()
method. In the
example below, we’ll apply each Policy
to 4 processes in a
Pool
.
1from dragon.native.machine import System
2from dragon.native.pool import Pool
3
4
5def gpu_work(item):
6 # GPU processing code, such as PyTorch or CuPy
7
8gpu_policies = System().gpu_policies()
9nworkers = 4 * len(gpu_policies)
10p = Pool(policy=policies, processes_per_policy=4)
11
12results = p.map_async(gpu_work, range(100)).get()
13print(f"{nworkers} workers say: {results}", flush=True)
14
15p.close()
16p.join()
17return results
Manually Derived Policies
To leverage full control over generating policies and applying them to processes, the first place to start is to ask
Dragon about the infrastructure it is running on and what GPUs are available. The
most direct way to access this information is through System
and
Node
. The example below shows how to use System
and Node
to manually inspect each node that
Dragon is running on and create a list of hostnames and GPU IDs. You can of course add any other logic you like
to focus on just a subset of the nodes or filter by other criteria about a node.
1from dragon.native.machine import System, Node
2
3
4def find_gpus():
5
6 all_gpus = []
7 # loop through all nodes Dragon is running on
8 for huid in System().nodes:
9 node = Node(huid)
10 # loop through however many GPUs it may have
11 for gpu_id in node.gpus:
12 all_gpus.append((node.hostname, gpu_id))
13 return all_gpus
Next we’ll use that list of tuples to create a list of Policies
where each Policy
specifies a host and a GPU on that host.
1from dragon.infrastructure.policy import Policy
2
3
4# pass in the output from find_gpus() above
5def make_policies(all_gpus=None, nprocs=32):
6
7 # loop over each desired Policy
8 # the number of which will be the number of processes we'll launch with ProcessGroup
9 policies = []
10 i = 0
11 for worker in range(nprocs):
12 # assign them in a round robin fashion
13 policies.append(Policy(placement=Policy.Placement.HOST_NAME,
14 host_name=all_gpus[i][0],
15 gpu_affinity=[all_gpus[i][1]]))
16 i += 1
17 if i == len(all_gpus):
18 i = 0
19 return policies
Test It Out with Native Pool
Now that we can build a list of Policies
for our processes, let’s
try it out using Pool
. In the example below, each worker will first say what host and
GPU it will use to verify its dragon.infrastructure.policy.Policy
is working as intended.
Then we’ll use PyTorch to do some computation on the specified GPU.
1import os
2import torch
3import numpy as np
4
5from dragon.native.machine import current
6from dragon.native import Pool
7
8
9# reuse find_gpus() and make_policies() from above
10
11# GPU affinity is specified to the process by Dragon using the relevant method/environment variable,
12# such as CUDA_VISIBLE_DEVICES for NVIDIA devices (AMD and Intel also supported, see dragon.infrastructure.gpu_desc)
13# we'll assume NVIDIA GPUs for this example and verify CUDA_VISIBLE_DEVICES
14def my_gpu():
15 mynode = current()
16 print(f"Hello!, I have GPU={os.getenv('CUDA_VISIBLE_DEVICES')} on host={mynode.hostname}", flush=True)
17
18
19# do some matrix multiplication
20def gpu_work(x):
21 v = np.array(512*[x*1.0])
22 nx = 16
23 ny = 512 // 16
24 a = v.reshape(ny, nx)
25 b = v.reshape(nx, ny)
26 tensor_a = torch.from_numpy(a).cuda()
27 tensor_b = torch.from_numpy(b).cuda()
28 output = torch.sum(torch.matmul(tensor_a, tensor_b)).cpu().item()
29
30 del tensor_a, tensor_b
31 torch.cuda.empty_cache()
32 return output
33
34
35# run a native Pool with the given number of workers, each assinged a single GPU
36def gpu_pool(nprocs=32):
37 all_gpus = find_gpus()
38 policies = make_policies(all_gpus=all_gpus, nprocs=nprocs)
39
40 # light up as many as nprocs worth of GPUs!
41 p = Pool(policy=policies, processes_per_policy=1, initializer=my_gpu)
42 results = p.map_async(gpu_work, range(32)).get()
43 p.close()
44 p.join()
45 return results
46
47
48if __name__ == '__main__':
49 gpu_pool()
Running this example on a 4 nodes, each equipped with 4 NVIDIA A100 GPUs, gives us:
$ pip install torch numpy
$ dragon gpu_pool.py
Hello!, I have GPU=0 on host=pinoak0039
Hello!, I have GPU=1 on host=pinoak0039
Hello!, I have GPU=1 on host=pinoak0035
Hello!, I have GPU=0 on host=pinoak0034
Hello!, I have GPU=0 on host=pinoak0036
Hello!, I have GPU=2 on host=pinoak0039
Hello!, I have GPU=2 on host=pinoak0035
Hello!, I have GPU=1 on host=pinoak0034
Hello!, I have GPU=3 on host=pinoak0039
Hello!, I have GPU=3 on host=pinoak0036
Hello!, I have GPU=3 on host=pinoak0035
Hello!, I have GPU=2 on host=pinoak0039
Hello!, I have GPU=1 on host=pinoak0034
Hello!, I have GPU=2 on host=pinoak0036
Hello!, I have GPU=0 on host=pinoak0035
Hello!, I have GPU=1 on host=pinoak0036
Hello!, I have GPU=2 on host=pinoak0035
Hello!, I have GPU=0 on host=pinoak0039
Hello!, I have GPU=2 on host=pinoak0034
Hello!, I have GPU=1 on host=pinoak0035
Hello!, I have GPU=1 on host=pinoak0036
Hello!, I have GPU=2 on host=pinoak0034
Hello!, I have GPU=3 on host=pinoak0039
Hello!, I have GPU=0 on host=pinoak0036
Hello!, I have GPU=0 on host=pinoak0035
Hello!, I have GPU=1 on host=pinoak0039
Hello!, I have GPU=0 on host=pinoak0034
Hello!, I have GPU=2 on host=pinoak0036
Hello!, I have GPU=3 on host=pinoak0034
Hello!, I have GPU=3 on host=pinoak0035
Hello!, I have GPU=3 on host=pinoak0034
Hello!, I have GPU=3 on host=pinoak0036
Test It Out with ProcessGroup
Next we’ll adapt some of the code above to run with ProcessGroup
, where we’ll
have a little more control over what the processes do. We’ll still run a Python function in this example, but you could
instead run serial executables or even MPI processes this way (see Orchestrate Processes and Orchestrate MPI Applications).
1import os
2import torch
3import numpy as np
4
5from dragon.native.machine import current
6from dragon.native.process_group import ProcessGroup, ProcessTemplate
7
8
9# reuse find_gpus() and make_policies() from above
10
11# GPU affinity is specified to the process by Dragon using the relevant method/environment variable,
12# such as CUDA_VISIBLE_DEVICES for NVIDIA devices (AMD and Intel also supported, see dragon.infrastructure.gpu_desc)
13# we'll assume NVIDIA GPUs for this example and verify CUDA_VISIBLE_DEVICES
14def my_gpu(id, x=512):
15 mynode = current()
16 print(f"ID {id} has GPU={os.getenv('CUDA_VISIBLE_DEVICES')} on host={mynode.hostname}", flush=True)
17
18 # reuse the definition of gpu_work() from above
19 gpu_work(x)
20
21
22def gpu_pg(nprocs=32):
23 all_gpus = find_gpus()
24 policies = make_policies(all_gpus, nprocs=nprocs)
25
26 # light up as many as nprocs worth of GPUs!
27 pg = ProcessGroup()
28 for i in range(nprocs):
29 pg.add_process(nproc=1, template=ProcessTemplate(target=my_gpu, args=(i, i,), policy=policies[i]))
30
31 pg.init()
32 pg.start()
33 pg.join()
34 pg.close()
35
36
37if __name__ == '__main__':
38 gpu_pg()
Running this example on a 4 nodes, each equipped with 4 NVIDIA A100 GPUs, gives us:
$ pip install torch numpy
$ dragon gpu_process_group.py
ID 18 has GPU=2 on host=pinoak0039
ID 2 has GPU=2 on host=pinoak0039
ID 0 has GPU=0 on host=pinoak0039
ID 19 has GPU=3 on host=pinoak0039
ID 17 has GPU=1 on host=pinoak0039
ID 16 has GPU=0 on host=pinoak0039
ID 3 has GPU=3 on host=pinoak0039
ID 1 has GPU=1 on host=pinoak0039
ID 30 has GPU=2 on host=pinoak0036
ID 29 has GPU=1 on host=pinoak0036
ID 28 has GPU=0 on host=pinoak0036
ID 12 has GPU=0 on host=pinoak0036
ID 13 has GPU=1 on host=pinoak0036
ID 25 has GPU=1 on host=pinoak0034
ID 15 has GPU=3 on host=pinoak0036
ID 6 has GPU=2 on host=pinoak0035
ID 22 has GPU=2 on host=pinoak0035
ID 4 has GPU=0 on host=pinoak0035
ID 14 has GPU=2 on host=pinoak0036
ID 24 has GPU=0 on host=pinoak0034
ID 20 has GPU=0 on host=pinoak0035
ID 5 has GPU=1 on host=pinoak0035
ID 9 has GPU=1 on host=pinoak0034
ID 27 has GPU=3 on host=pinoak0034
ID 31 has GPU=3 on host=pinoak0036
ID 8 has GPU=0 on host=pinoak0034
ID 21 has GPU=1 on host=pinoak0035
ID 23 has GPU=3 on host=pinoak0035
ID 7 has GPU=3 on host=pinoak0035
ID 11 has GPU=3 on host=pinoak0034
ID 10 has GPU=2 on host=pinoak0034
ID 26 has GPU=2 on host=pinoak0034