HongyuanLuke

Official repository for textual frequency law

48
0
89% credibility
Found Apr 05, 2026 at 48 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

This repository implements methods from a research paper to improve large language models on math reasoning and machine translation by curating training data based on textual word frequencies.

How It Works

1
📚 Discover FrequencyLaw

You hear about a clever idea from a research paper on how common words versus rare words can make AI better at solving math problems or translating languages.

2
📝 Gather Your Examples

You collect a list of math questions or sentences you want the AI to handle, like simple word problems or phrases to translate.

3
Create Variations

You use a smart helper to generate lots of different ways to say the same thing, keeping the meaning exactly the same but changing the wording.

4
Pick Your Goal
🧮
Math Reasoning

Work on problems like 'If you have 5 apples and eat 2, how many left?' to train AI to solve them accurately.

🌐
Language Translation

Practice turning English sentences into other languages, like Azerbaijani, for smoother results.

5
📊 Sort by Word Commonness

You figure out which versions use everyday words and which use rarer ones, splitting them into two helpful groups.

🎉 AI Gets Smarter

Your AI now performs better on math or translations because it learned from these specially grouped examples, just like the research promised!

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 48 to 48 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is frequencylaw?

Frequencylaw is a Python toolkit that implements textual frequency law experiments from a research paper, letting you compute Zipf-style frequency scores for text using NLTK and Brown corpus data. It generates high/low-frequency paraphrase pairs for datasets like GSM8K math problems and FLORES-200 translations, then fine-tunes LLMs with PyTorch, Hugging Face Transformers, and LoRA to prove low-frequency prompts tank performance in reasoning and translation. Users get an end-to-end pipeline—from dataset pairing to evaluation—reproducing results showing frequency distillation and curriculum training boosts accuracy.

Why is it gaining traction?

As the official GitHub repository for textual frequency law, it stands out by quantifying how rare words hurt LLMs, with simple scripts for frequency sorting and lightweight fine-tuning that beat baselines without full retraining. Developers dig the plug-and-play dataset tools and eval metrics, skipping manual prompt hacking. It's a quick win for validating frequency effects over generic SFT setups.

Who should use this?

LLM researchers testing prompt frequency impacts on math reasoning or low-resource translation. Fine-tuning teams on GSM8K/FLORES-200 wanting curriculum schedules via high-to-low frequency ordering. NLP devs exploring distillation without building from scratch.

Verdict

Grab it if you're in LLM optimization research—solid docs and repro code make it useful despite 48 stars and 0.9% credibility score signaling early maturity. Test on your datasets first; lacks broad tests or production polish.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.