notoriouslab

trad-zh-search 可單獨搭配主流搜尋引擎,專門給繁體中文使用的繁體中文文本預處理工具 —— CKIP 分詞 + bigram 索引生成,附可選擇的領域字典系統

11
0
100% credibility
Found Mar 24, 2026 at 11 stars -- GitGems finds repos before they trend. Get early access to the next one.
Sign Up Free
AI Analysis
Python
AI Summary

A toolkit that preprocesses Traditional Chinese text to dramatically improve search accuracy using smart word breaking, character pairs, and custom term lists.

How It Works

1
🔍 Notice poor search results

You're frustrated because searches on your Traditional Chinese articles miss key phrases like church names or teachings.

2
💡 Discover the helper tool

You find trad-zh-search, a simple kit made to make Chinese searches work much better.

3
📥 Bring it into your project

You easily add the tool to your setup so it's ready to use.

4
Smartly prepare your text

You feed in your articles, and the tool breaks them into perfect searchable pieces while keeping important terms whole.

5
📝 Add your special words

Optionally include a list of unique terms from your world, like names or topics, to make matches even smarter.

6
🔗 Connect to search

You share the prepared text with your website's search feature for instant improvement.

🎉 Searches shine!

Now everyone finds exactly what they need quickly, feeling the joy of spot-on results.

Sign up to see the full architecture

5 more

Sign Up Free

Star Growth

See how this repo grew from 11 to 11 stars Sign Up Free
Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose
AI-Generated Review

What is trad-zh-search?

trad-zh-search is a Python library for preprocessing Traditional Chinese (trad-zh) text before feeding it to search engines like Meilisearch or MiniSearch. It runs CKIP segmentation for accurate word breaks, generates bigrams to catch substring matches, and applies pluggable domain dictionaries to preserve terms like organization names. Install minimally for bigram-only mode, or add CKIP for full power—outputs ready-to-index fields in three lines of code.

Why is it gaining traction?

Unlike jieba, which chokes on trad-zh phrases like "聖靈充滿" or "台北靈糧堂", this delivers precise CKIP tokens plus bigrams, with benchmarks showing 21% better top-1 results on 8,000 articles. Adapters handle multi-field docs and synonyms for Meilisearch, or pre-tokenized JSON for MiniSearch on static sites; auto-build dictionaries from your corpus via NER. Stop-bigram filtering cleans queries without emptying them.

Who should use this?

Developers building search over Taiwanese blogs, news, or Christian content where jieba fails on proper nouns. Backend teams indexing trad-zh docs in Meilisearch; static site generators like Astro or Hugo adding client-side search. Anyone needing quick domain dicts for law, medicine, or org-specific terms.

Verdict

Early alpha at 11 stars and 1.0% credibility score, but thorough docs and working adapters make it usable now for trad-zh search pain points. Test on your data—strong start if you're in the niche.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.