Amii HPC Documentation#
Welcome to Amii’s HPC documentation. This resource provides the core concepts, procedures, and best practices necessary to execute computation-intensive
workloads efficiently on High-Performance Computing (HPC) clusters, specifically those managed by Alberta Machine Intelligence Institute (Amii) and its partners.
What this documentation covers#
Access & Setup: Navigating account configuration with Digital Research Alliance of Canada (DRAC).
Architecture: An overview of HPC cluster design and infrastructure.
Workload Management: Mastering
Slurmscheduler for job submission and resource allocation.Optimization: Techniques for monitoring and tuning performance on HPC systems.
Practical Resources: Reusable templates, proven workflows, and examples developed by Amii’s Engineering and Performance team.
Throughout these guides, we utilize Amii’s Vulcan cluster as a primary reference and example environment.
Target Audience#
While primarily designed for researchers and students with access to Amii-managed clusters, this documentation serves as a broad knowledge base. Anyone interested in the following areas will find these resources valuable:
Distributed and parallel computing
Slurm workload management
GPU optimization for AI workflows
DevOpsandMLOpsintegration
Note
A significant portion of this content covers general HPC concepts applicable to systems beyond those managed by Amii.
Table of Contents#
Getting Started with Clusters
Basics of HPC and Slurm
Useful Resources