Use Cases
Examples By Category
Multiprocessing
Distributed Python and interactive Jupyter
Data
Easy-to-use and HPC-optimized data exchange between applications
Telemetry
Observability for applications and workflows
Workflow
AI/HPC workflows across systems and sites
AI
Data loading and resilient training and inference
Tutorials
Data processing
Process a large dataset in parallel
Orchestrating processes
Orchestrate executing a serial executable with many different arguments
Orchestrating MPI applications
Orchestrate a parameter sweep for an MPI application
Distributed PyTorch
Train with PyTorch across many GPUs
Workflows
Develop a workflow that puts it all together
Running across nodes
Run on a collection of servers or a supercomputer
Jupyter
Use a Jupyter notebook with Dragon
Controlling GPU Affinity
Running functions and processes that use specific GPUs
Telemetry with Grafana
Visualize and analyze system and custom metrics
Debugging
Best practies for debugging