A multi-lingual benchmark for evaluating industrial domain knowledge of LLMs.
IndustryBench provides a dataset of source-grounded industrial procurement questions and an evaluation script to test large language models' knowledge across multiple languages using official standards.
How It Works
You come across this benchmark while looking for ways to test how well AI understands real-world industrial products and standards from a research paper or online search.
You read about the collection of 2,000+ questions on industrial procurement, grounded in official standards, available in Chinese, English, Russian, and Vietnamese.
You easily grab the full set of questions, correct answers, and background facts from the Hugging Face page without needing to download anything extra.
You save the questions into a simple file and connect your AI service so it can try answering them like an industrial expert.
You launch the evaluation, watching as your AI answers each question, gets scored for accuracy on a 0-3 scale, and checked for safety issues.
You review the results showing average scores, breakdowns by difficulty and industry, plus any safety flags.
You now have clear insights into your AI's strengths and gaps in industrial knowledge, ready to improve or compare models.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.