AI Knowledge Base Platform - README (EN)
OSS implementation of an enterprise-oriented knowledge base platform. This page is a static-site friendly README view aligned with project documentation.
1. What You Can Do
- Chunk upload with asynchronous parsing pipeline
- Structuring for image/diagram-heavy documents
- Hybrid retrieval (vector + full-text) for Q&A
- Org-tag based access control (owner/public/org/default)
- Evaluation data logging for iterative quality improvement
2. Typical Use Cases
Cross-document search
Search across design docs, operation manuals, and flow diagrams.
Search across design docs, operation manuals, and flow diagrams.
Org-scoped sharing
Control visibility by owner/public/org boundaries.
Control visibility by owner/public/org boundaries.
Quality operations
Improve Recall/Precision/Faithfulness/Completeness over time.
Improve Recall/Precision/Faithfulness/Completeness over time.
Evidence-first answers
Return source links/images for verification.
Return source links/images for verification.
3. Quick Start (Docker)
- Create config file:
cp .env.example .env
- Edit
.env(minimum:OPENAI_API_KEYand passwords) - Start services:
cd app
./start_docker.sh pg up
- Health check:
docker ps --format "table {{.Names}}\t{{.Status}}"
curl http://localhost:8000/health
- Stop services:
cd app
./start_docker.sh pg down
4. Minimal User Flow
- Register account (org settings)
- Upload document (scope + org tag)
- Ask in Knowledge Q&A
- Validate answer with evidence links/images
5. System Overview (Summary)
For diagrams and design details, see Architecture.
API Layer
FastAPI + WebSocket
FastAPI + WebSocket
Processing Layer
Kafka + Document Processor
Kafka + Document Processor
Data Layer
PostgreSQL / Redis / MinIO / Elasticsearch
PostgreSQL / Redis / MinIO / Elasticsearch
AI Layer
OpenAI Embedding / Chat / Vision
OpenAI Embedding / Chat / Vision
6. Main Flows
6.1 Upload -> Parse -> Index
/upload/chunk: receive chunks and store temp objects/upload/merge: compose merged file and publish parse task- Processor executes parse/chunk/embedding/index pipeline
- Persist metadata and mark file status DONE
6.2 Question -> Retrieve -> Answer
- WebSocket chat receives user question
- Intent routing and query understanding
- Hybrid retrieval with ACL filtering
- LLM generates grounded answer from evidence context
- Usage/conversation events are logged
7. Key APIs
| Method | Path | Purpose |
|---|---|---|
| POST | /api/v1/auth/register | User registration |
| POST | /api/v1/auth/login | Login |
| POST | /api/v1/upload/chunk | Chunk upload |
| POST | /api/v1/upload/merge | Finalize upload |
| GET | /api/v1/search/hybrid | Hybrid search |
| WS | /api/v1/chat?token=... | Q&A chat |
8. Operational Notes
- Replace all secrets in
.envbefore production use - Default deployment is single-node oriented
- Tune ES/Kafka/OpenAI parameters for real workload
- Copy
.env.exampleto.envbefore first run
9. Known Limitations (v0.1.0-draft)
- Single-node deployment by default
- No built-in horizontal autoscaling package
- Some defaults are for demo/staging, not full production hardening
10. Additional Documents
- Japanese guide: readme-ja.html
- Architecture notes: architecture.html
- Security policy: security.html
- Contributing guide: contributing.html
- Release notes: release-notes.html
11. Security Reporting
See security.html for vulnerability reporting and SLA policy.
12. Contributing
See contributing.html for development setup and PR rules.
13. Release Notes
See release-notes.html for current milestone notes.
14. License
See license.html (Apache-2.0 summary and source link).