xaskasdf / ntransformer
PublicHigh-efficiency LLM inference engine in C++/CUDA. Run Llama 70B on RTX 3090.
NTransformer is an efficient engine for running large language models on single consumer GPUs by smartly managing memory across graphics card, system RAM, and optional direct storage access.
How It Works
You learn about a clever tool that lets everyday gaming computers handle giant AI chatbots without needing supercomputers.
You confirm your computer has a strong graphics card and runs Linux, so it's ready for big AI tasks.
You download a compact AI model file that contains all the smarts for chatting or creating text.
For the biggest models, you copy the file to a speedy storage drive to make everything zoom even faster.
You start the program with your model file and begin typing questions or prompts, watching ideas flow out super quick.
Jump into back-and-forth conversations like talking to a smart friend.
Generate stories, answers, or test speeds with custom prompts.
Your home computer delivers blazing-fast, smart responses, making powerful AI feel easy and magical.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.