Multimodal Analyst
PDF & Image Q&AA full-stack Next.js app backed by a Multi-modal RAG pipeline. It extracts both text and images from documents, encodes them, and answers complex structural queries using a Vision LLM. Ideal for analyzing reports, flowcharts, and slide decks.
Live previewinteractive mock — no backend
Vision Analyst
Inspect files visually.
Stack
Frontend
Next.js
Architecture
Multi-modal RAG
Framework
raw python
Vector DB
qdrant
Corpus
medium
Complexity
High
How it works
The frontend talks only to the backend; your API keys + pipeline URL stay server-side.
Get started
cp .env.example .env # add your API key(s) docker compose up --build
cd backend python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt cp .env.example .env # add your API key(s) python pipeline.py ingest ./corpus uvicorn serve:app --reload # http://localhost:8000
cd frontend npm install cp .env.local.example .env.local npm run dev # http://localhost:3000
The downloaded README.md has the full guide — vector DB setup, API keys, and deployment to Render/Railway + Vercel.
Project structure25 backend · 14 frontend · 43 files total
Ready to build?
Download the full monorepo and follow the README.