YutoTerashima / multilingual-llm-safety-bench

Public

English, Japanese, and Chinese LLM safety mini-benchmark.

benchmark chinese japanese llm-safety multilingual

100% credibility

Found May 03, 2026 at 16 stars -- GitGems finds repos before they trend. Get early access to the next one.

AI Analysis

Python

AI Summary

This project is a benchmark for testing AI language model safety behaviors like refusal and compliance across English, Japanese, and Chinese, with tools to run classification experiments and generate analysis reports.

How It Works

🔍 Discover the safety tester

You find a helpful tool on GitHub that checks how AI chats safely in English, Japanese, and Chinese.

📖 Read the friendly guide

The clear instructions explain what it tests and how to start in simple steps.

🚀 Set it up quickly

You add the tool to your computer with an easy install, like downloading a new app.

🧪 Try a quick check

You run a small test and instantly see results on AI's safe responses in different languages.

⚡ Run the full study

You launch a bigger review using your computer's graphics power to analyze lots of examples.

📊 Explore the reports

Beautiful charts, tables, and summaries show how well safety works across languages.

🎉 Gain new insights

You now understand AI safety differences in various languages and can share the discoveries.

Sign up to see the full architecture

5 more

Star Growth

See how this repo grew from 16 to 17 stars Sign Up Free

Repurpose This Repo

Repurpose is a Pro feature

Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.

Unlock Repurpose

AI-Generated Review

What is multilingual-llm-safety-bench?

This Python tool runs compact safety benchmarks for LLMs across English, Japanese, and Chinese, testing behaviors like refusal, over-refusal, and unsafe compliance on synthetic prompts. It solves the gap in multilingual safety evals by pulling real data from datasets like multilingual-safety-classification-dataset, running GPU-backed classification baselines (TF-IDF, MLP), and spitting out metrics, per-language breakdowns, figures, and Markdown reports. Users get quick smoke tests or full 80k-example runs via simple scripts, with redacted failure analysis to avoid sensitive content.

Why is it gaining traction?

Unlike english-japanese translators or github english dictionaries focused on vocab, this bench reveals real safety gaps in non-english github projects, like script effects in Chinese or Japanese where char n-grams beat word models. Devs dig the reproducible conda-powered GPU flows that auto-generate CSVs, JSONs, confusion matrices, and group slices (e.g., amh_Ethi unsafe recall), making it easy to spot over-refusal in low-resource languages without building from scratch.

Who should use this?

AI safety engineers tuning multilingual moderation for chat apps handling english-japanese language exchange or chinese prompts. Researchers prototyping baselines before fine-tuning mBERT on safety data. Moderation teams at Discord-like platforms auditing LLM transfer from English to Japanese names or scripts.

Verdict

Solid research prototype for multilingual safety benches, with strong docs and pytest coverage, but low maturity at 14 stars and 1.0% credibility score limits production trust. Grab it for quick per-language evals if you're experimenting with non-english LLM risks.

(198 words)

Sign up to read the full AI review Sign Up Free

Similar repos coming soon.

Stars

Forks

Followers

Base stars: 17 stars

Penalty: Very new repo (2d): -70%

Bonus: AI verified quality (100%)

Account age: 1,349 days

Repo age: 2 days

License: MIT

Updated: May 04, 2026