Resources
Links for NGC Tools
HPC SDK Blog/Customer Presentation/GTC presentation
Cool tools - NGC Container Tools (the articles contain links to the GitHub Repos):
NGC Module Files – TACC lmod based container user tool
NVIDIA
SDK List
Mig Mode Notes
“Users should note the following considerations when the A100 is in MIG mode:
- No graphics APIs are supported (e.g. OpenGL, Vulkan etc.)
- No GPU to GPU P2P (either PCIe or NVLink) is supported
- CUDA applications treat a Compute Instance and its parent GPU Instance as a single CUDA device. See this section on device enumeration by CUDA
- CUDA IPC across GPU instances is not supported. CUDA IPC across Compute instances is supported
- CUDA debugging (e.g. using cuda-gdb) and memory/race checking (e.g. using cuda-memcheck or compute-sanitizer) is supported
- CUDA MPS is supported on top of MIG. The only limitation is that the maximum number of clients (48) is lowered proportionally to the Compute Instance size
GPUDirect RDMA is supported when used from GPU Instances”
NVIDIA Virtual Computer Server
NVIDIA Multi-Instance GPU and NVIDIA Virtual Compute Server
Xalt Blog
Maximizing Data Center Productivity with Application Workload Analysis
DCGM
Job Statistics with NVIDIA Data Center GPU Manager and SLURM