Home Architecture Guide (JA) GitHub

AI Knowledge Base Platform - README (EN)

OSS implementation of an enterprise-oriented knowledge base platform. This page is a static-site friendly README view aligned with project documentation.

1. What You Can Do

Chunk upload with asynchronous parsing pipeline
Structuring for image/diagram-heavy documents
Hybrid retrieval (vector + full-text) for Q&A
Org-tag based access control (owner/public/org/default)
Evaluation data logging for iterative quality improvement

2. Typical Use Cases

Cross-document search
Search across design docs, operation manuals, and flow diagrams.

Org-scoped sharing
Control visibility by owner/public/org boundaries.

Quality operations
Improve Recall/Precision/Faithfulness/Completeness over time.

Evidence-first answers
Return source links/images for verification.

3. Quick Start (Docker)

Create config file:

cp .env.example .env

Edit .env (minimum: OPENAI_API_KEY and passwords)
Start services:

cd app
./start_docker.sh pg up

Health check:

docker ps --format "table {{.Names}}\t{{.Status}}"
curl http://localhost:8000/health

Stop services:

cd app
./start_docker.sh pg down

4. Minimal User Flow

Register account (org settings)
Upload document (scope + org tag)
Ask in Knowledge Q&A
Validate answer with evidence links/images

5. System Overview (Summary)

For diagrams and design details, see Architecture.

API Layer
FastAPI + WebSocket

Processing Layer
Kafka + Document Processor

Data Layer
PostgreSQL / Redis / MinIO / Elasticsearch

AI Layer
OpenAI Embedding / Chat / Vision

6. Main Flows

6.1 Upload -> Parse -> Index

/upload/chunk: receive chunks and store temp objects
/upload/merge: compose merged file and publish parse task
Processor executes parse/chunk/embedding/index pipeline
Persist metadata and mark file status DONE

6.2 Question -> Retrieve -> Answer

WebSocket chat receives user question
Intent routing and query understanding
Hybrid retrieval with ACL filtering
LLM generates grounded answer from evidence context
Usage/conversation events are logged

7. Key APIs

Method	Path	Purpose
POST	`/api/v1/auth/register`	User registration
POST	`/api/v1/auth/login`	Login
POST	`/api/v1/upload/chunk`	Chunk upload
POST	`/api/v1/upload/merge`	Finalize upload
GET	`/api/v1/search/hybrid`	Hybrid search
WS	`/api/v1/chat?token=...`	Q&A chat

8. Operational Notes

Replace all secrets in .env before production use
Default deployment is single-node oriented
Tune ES/Kafka/OpenAI parameters for real workload
Copy .env.example to .env before first run

9. Known Limitations (v0.1.0-draft)

Single-node deployment by default
No built-in horizontal autoscaling package
Some defaults are for demo/staging, not full production hardening

10. Additional Documents

Japanese guide: readme-ja.html
Architecture notes: architecture.html
Security policy: security.html
Contributing guide: contributing.html
Release notes: release-notes.html

11. Security Reporting

See security.html for vulnerability reporting and SLA policy.

12. Contributing

See contributing.html for development setup and PR rules.

13. Release Notes

See release-notes.html for current milestone notes.

14. License

See license.html (Apache-2.0 summary and source link).