liangdabiao / Multimodal-RAG
Public基于多模态 Embedding + Zilliz + Qwen 视觉理解的多模态 RAG 系统。支持 **Cohere / DashScope Embedding** 和 **DashScope / OpenRouter LLM** 双引擎切换。上传 PDF,用自然语言提问,系统自动检索最相关的页面并由 AI 生成回答。 与传统 RAG 不同,本系统**不做文本提取和 OCR**,而是直接将 PDF 页面当作图片处理,通过视觉 Embedding 模型编码,完整保留表格、图表、排版、手写批注等所有视觉信息。
A web application that lets users upload PDF files to ask natural language questions and receive AI-generated answers with images of relevant pages.
How It Works
You come across a handy web tool that turns your PDF documents into a smart chat buddy for finding info fast.
You start the tool on your own machine and open it in your web browser like any website.
You select a PDF file from your files and send it to the tool to get ready for questions.
The tool scans through every page of your PDF, making it ready to understand and pull out answers just like magic.
You type a simple question about anything in the PDF, such as 'What’s the main idea on pricing?'
You get a clear, helpful response with images of the exact pages showing the relevant details, saving you hours of searching.
Star Growth
Repurpose is a Pro feature
Generate ready-to-use prompts for X threads, LinkedIn posts, blog posts, YouTube scripts, and more -- with full repo context baked in.
Unlock RepurposeSimilar repos coming soon.