barometech

GPT-2 124M tool-calling: 50% BFCL, 92% fresh bench. Adapter (250K) + Full FT. CPU reproducible.

10
3
85% credibility
Found May 19, 2026 at 10 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

This is a research project demonstrating that GPT-2 124M—a small, open-source AI model—can be fine-tuned to perform function-calling tasks (like deciding when to call a weather API or database query). The project provides two pre-trained checkpoints: a lightweight adapter approach (250K trainable parameters, 1 MB) achieving 50% accuracy on industry benchmarks, and a full fine-tune approach (124M parameters, 475 MB) achieving 92% accuracy on a fresh 690-item test with zero training contamination. Users can download the ready-made models to immediately add function-calling to their applications, or train custom versions from scratch using provided scripts. The project is MIT-licensed, well-documented with honest disclosure of benchmark limitations, and reproducible on a standard laptop CPU.

How It Works

1
💡 You hear about a tiny AI that can call functions

Someone mentions that a small, free AI model (GPT-2) can be taught to understand when to call tools like weather APIs or database queries.

2
📦 You download the ready-made model

You grab one of two pre-trained versions: a lightweight adapter (1 MB) or the full model (475 MB) that works right out of the box.

3
You choose your path
Use the pre-trained model

Load the weights, give the model your function descriptions, and start asking it questions immediately.

🔧
Train your own version

Feed it examples of your own tools and requests, let it learn for about an hour on your computer, and save your custom model.

4
🎯 You give the model your tools to learn about

You describe your functions in plain text—things like get_weather(city) or search_database(query)—and the model studies them.

5
💬 You ask questions and the model calls tools

You say 'What's the weather in Paris?' and the model responds by outputting a structured call like get_weather(city='Paris').

🎉 Your tiny AI handles function calls at 92% accuracy

The model correctly picks the right tool and arguments for novel requests it has never seen before, running entirely on your own computer.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 10 to 10 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is gpt2-tool-call?

This project teaches GPT-2 124M -- yes, the tiny 124-million parameter model -- to call functions. It provides two trained checkpoints: a lightweight adapter that adds just 250K trainable parameters to frozen GPT-2, and a full fine-tune version. Both run entirely on CPU and achieve surprisingly competitive results on tool-calling benchmarks. The adapter hits 50% on BFCL v4, while the full fine-tune reaches 92% on a custom 690-item benchmark with zero training contamination.

Why is it gaining traction?

The hook is clear: you can run a tool-calling model on your laptop in under an hour of training, with no GPU required. The full fine-tune approach demonstrates that a small model can generalize to novel function names with 92% accuracy -- a result that holds up against models 30x its size on simple single-tool tasks. The project also earns trust by disclosing training data overlap and providing reproducible benchmarks with honest caveats.

Who should use this?

This is for developers building lightweight agents or exploring tool-calling without the overhead of large models. Researchers studying small language model capabilities will find the ablation studies valuable. Teams wanting to prototype tool-calling locally before scaling to larger infrastructure will get the most value here.

Verdict

The credibility score of 0.85% reflects an early-stage project with only 10 stars and limited community validation. The technical work is rigorous and the results are compelling, but the low maturity means you'll be blazing your own trail. Use this to learn and experiment, not for production systems.

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.