AMD-AGI / maxtext-slurm
PublicToolkit for launching and observing MaxText training on Slurm-managed GPU clusters
Toolkit for easily launching and monitoring large language model training jobs on AMD GPU clusters using Slurm, with built-in dashboards and analysis tools.
How It Works
You hear about a helpful set of tools for training large AI language models on powerful computer clusters.
You download the tools to your computer cluster and try a simple example on one machine to see it work.
With one easy command, you launch training for a huge model like Llama across many computers—it handles everything automatically.
Open dashboards to see training progress, speeds, temperatures, and health checks in real time—no setup needed.
After it finishes, review charts, logs, and smart analysis to understand performance and spot improvements.
You now have a trained AI model with full insights into how it performed, ready for your next project.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.