Use Cases
Examples By Category
Multiprocessing
Distributed Python and interactive Jupyter
Data
Easy-to-use and HPC-optimized data exchange between applications
Telemetry
Observability for applications and workflows
Workflow
AI/HPC workflows across systems and sites
AI
Data loading and resilient training and inference
Tutorials
Data processing
Process a large dataset in parallel
Orchestrating processes
Orchestrate executing a serial executable with many different arguments
Orchestrating MPI applications
Orchestrate a parameter sweep for an MPI application
Distributed PyTorch
Train with PyTorch across many GPUs
Resiliency with DDict Checkpointing
Track application state to enable fault tolerance and recovery
Running across nodes
Run on a collection of servers or a supercomputer
Jupyter
Use a Jupyter notebook with Dragon
Controlling GPU Affinity
Running functions and processes that use specific GPUs
Telemetry with Grafana
Visualize and analyze system and custom metrics
Workflows
Develop a workflow that puts it all together