A production-grade Retrieval-Augmented Generation (RAG) system built with FastAPI, featuring real-time streaming, WebSocket support, and cutting-edge RAG techniques.
- π Hybrid Search: Combines dense retrieval with semantic search
- π Intelligent Reranking: Cross-encoder models for optimal result ordering
- π Query Expansion: Automatic synonym and semantic expansion
- π― Self-RAG: Confidence evaluation and adaptive retrieval
- β‘ Long-Context RAG: Handles extended document contexts efficiently
- FastAPI: High-performance async web framework
- WebSockets: Real-time bidirectional communication
- Streaming: Token-by-token response streaming (SSE & WebSocket)
- Async/Await: Non-blocking I/O throughout the application
- Connection Management: Robust WebSocket connection handling
- Redis Caching: Intelligent response caching with TTL
- Connection Pooling: Efficient database connections
- Load Balancing Ready: Multi-worker support with uvloop
- Prometheus Metrics: Built-in performance monitoring
- Structured Logging: Comprehensive logging with structlog
- Framework: FastAPI with uvicorn
- Vector DB: ChromaDB for embeddings
- LLM: OpenAI GPT-4 (configurable)
- Embeddings: Sentence Transformers (BAAI/bge-large)
- Reranking: Cross-encoder models
- Caching: Redis
- Database: MongoDB for chat history
- Monitoring: Prometheus + OpenTelemetry ready
- Python 3.8+
- Redis server
- MongoDB instance
- ChromaDB server
- OpenAI API key
git clone https://github.com/yourusername/advanced-rag-system.git
cd advanced-rag-systempip install -r requirements.txt
# Download required NLP models
python -m spacy download en_core_web_sm
python -m nltk.downloader wordnetexport OPENAI_API_KEY="your-api-key"
export CHROMA_HOST="localhost"
export MONGODB_URI="mongodb://localhost:27017"
export REDIS_URL="redis://localhost:6379"# Development
uvicorn main:app --reload
# Production
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4 --loop uvloopPOST /api/v1/query
Content-Type: application/json
{
"query": "What are the latest RAG techniques?",
"user_id": "user123",
"session_id": "session456",
"top_k": 5,
"temperature": 0.7,
"stream": true
}POST /api/v1/documents
Content-Type: application/json
[
{
"id": "doc1",
"content": "Document content here...",
"metadata": {
"source": "research_paper.pdf",
"date": "2024-01-01"
}
}
]GET /api/v1/chat/history/{session_id}?limit=50Connect to WebSocket endpoint:
const ws = new WebSocket('ws://localhost:8000/ws/chat/session123');
// Send query
ws.send(JSON.stringify({
type: 'query',
data: {
query: 'Your question here',
stream: true,
top_k: 5
}
}));
// Receive streaming response
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
switch(message.type) {
case 'chunk':
// Handle streaming text chunk
console.log(message.data);
break;
case 'sources':
// Handle source documents
console.log('Sources:', message.data);
break;
case 'complete':
// Handle completion
console.log('Complete:', message.data);
break;
}
};config = {
"embedding_model": "BAAI/bge-large-en-v1.5",
"reranker_model": "cross-encoder/ms-marco-MiniLM-L-6-v2",
"chroma_host": "localhost",
"chroma_port": 8000
}- Default TTL: 300 seconds (5 minutes)
- Configurable per endpoint
- Temperature: 0.0 - 2.0 (default: 0.7)
- Top-K: 1 - 20 (default: 5)
- Max tokens: Configurable per request
rag_queries_total: Total number of RAG queriesrag_query_duration_seconds: Query processing durationwebsocket_connections_total: Total WebSocket connections
Access metrics at: http://localhost:8000/metrics
GET /healthpytest tests/unit/pytest tests/integration/locust -f tests/load/locustfile.py --host=http://localhost:8000docker build -t advanced-rag:latest .docker-compose up -dversion: '3.8'
services:
rag-api:
image: advanced-rag:latest
ports:
- "8000:8000"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- CHROMA_HOST=chroma
- MONGODB_URI=mongodb://mongo:27017
- REDIS_URL=redis://redis:6379
depends_on:
- chroma
- mongo
- redis
chroma:
image: chromadb/chroma:latest
ports:
- "8001:8000"
mongo:
image: mongo:latest
ports:
- "27017:27017"
redis:
image: redis:alpine
ports:
- "6379:6379"apiVersion: apps/v1
kind: Deployment
metadata:
name: rag-api
spec:
replicas: 3
selector:
matchLabels:
app: rag-api
template:
metadata:
labels:
app: rag-api
spec:
containers:
- name: rag-api
image: advanced-rag:latest
ports:
- containerPort: 8000
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: api-secrets
key: openai-key- Use Fargate for serverless deployment
- Configure ALB for load balancing
- Set up CloudWatch for logging
- Workers: Set based on CPU cores (2 * CPU + 1)
- Connection Pool: Adjust based on concurrent users
- Cache TTL: Balance between freshness and performance
- Chunk Size: Optimize for streaming performance
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- FastAPI for the amazing async framework
- Anthropic and OpenAI for LLM capabilities
- The open-source RAG community for continuous innovations
Built with β€οΈ for the AI community