mts-ai

mts-ai / OpenAutoNLU

Public

An open-source pipeline for training natural language understanding models

26
0
100% credibility
Found Mar 03, 2026 at 26 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

OpenAutoNLU is an open-source tool that automatically trains text classification and named entity recognition models from your labeled data using the best method for your dataset size, complete with data quality checks and an interactive web demo.

How It Works

1
🔍 Discover OpenAutoNLU

You find this helpful tool that makes training language models for understanding text super easy, no coding needed.

2
📱 Open the friendly app

Launch the simple web interface to get started right away.

3
📤 Upload your examples

Pick if you want to classify text or find names in sentences, then share your training examples as easy files.

4
🔍 Clean your data

Let it scan and fix any messy or wrong labels so everything looks perfect.

5
⚙️ Customize and train

Choose fun extras like making more examples or spotting unknown topics, then watch it automatically build your smart model.

Predict on new text

Your model is ready! Feed it sentences and get spot-on labels or highlighted names every time.

Sign up to see the full architecture

4 more

Sign Up Free

Star Growth

See how this repo grew from 26 to 26 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is OpenAutoNLU?

OpenAutoNLU is an open-source AI pipeline in Python that automates training natural language understanding models for text classification and named entity recognition. Feed it CSV or JSON train/test data in English or Russian, and it auto-selects the best method—few-shot learning for tiny datasets, fine-tuning for larger ones—while running data quality checks, optional LLM augmentation, OOD detection, and ONNX export for deployment. Developers get a Streamlit app for interactive demos, Docker support for CPU/GPU, and simple inference APIs, solving the hassle of manual method picking and hyperparameter tuning in NLU workflows.

Why is it gaining traction?

Unlike manual Transformers setups or rigid frameworks, this open-source pipeline framework auto-resolves training strategies based on your data size, blending few-shot efficiency with full fine-tuning, plus built-in diagnostics to flag bad samples. LLM-powered augmentation and synthetic test gen handle imbalanced or scarce labels without extra tools, and ONNX export skips deployment headaches. As a github open source tool, it fits neatly into CI/CD as an open source github actions alternative for NLU prototyping.

Who should use this?

NLP engineers building intent classifiers or NER for voice assistants, chatbots, or RAG pipelines open source setups with limited labels. Teams needing quick data quality scans and OOD safety in production NLU without deep ML expertise. Russian/English bilingual projects wanting a self-hosted open source github copilot alternative for automated model training.

Verdict

Promising open-source data pipeline for few-shot NLU, but at 26 stars and 1.0% credibility score, it's early—docs are solid with examples, but expect tweaks for scale. Try the Docker demo if you're prototyping language models; skip for mission-critical unless you contribute.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.