A comprehensive collection of production-ready system design blueprints, architectural patterns, and scalability strategies. Built from real-world experience designing systems that handle millions of requests per second.
This repository provides battle-tested architectural patterns and system design templates used in production environments at scale. Each design includes capacity estimation, component deep-dives, failure scenarios, and infrastructure-as-code examples.
Target Audience: Senior Engineers, Solution Architects, Technical Leads, and Engineering Managers preparing for system design interviews or designing real systems.
- System Design Interview Templates
- Architectural Patterns
- Cloud Reference Architectures
- Data Engineering
- Observability & Reliability
- Infrastructure as Code
- Performance Engineering
Production-grade designs for common system design interview questions. Each template follows a structured approach: requirements gathering, capacity estimation, high-level design, deep-dive components, and failure analysis.
| System | Complexity | Key Technologies | Status |
|---|---|---|---|
| URL Shortener | Entry | DynamoDB, Redis, CloudFront | Complete |
| Rate Limiter | Entry | Redis, Token Bucket, Sliding Window | Complete |
| Distributed Cache | Intermediate | Consistent Hashing, Redis Cluster | Complete |
| Message Queue | Intermediate | Kafka, RabbitMQ, SQS | Complete |
| Real-time Messaging | Intermediate | WebSocket, Cassandra, Signal Protocol | Complete |
| News Feed | Intermediate | Fan-out, Timeline Service, Graph DB | Complete |
| Search Autocomplete | Intermediate | Trie, Elasticsearch, Prefix Trees | Complete |
| Video Streaming | Advanced | Adaptive Bitrate, CDN, Transcoding | Complete |
| Ride Sharing | Advanced | Geospatial Index, Real-time Matching | Complete |
| Distributed File System | Advanced | GFS, HDFS, Erasure Coding | Complete |
| Payment System | Advanced | ACID, Idempotency, Reconciliation | Complete |
| Ad Serving Platform | Expert | RTB, ML Inference, Sub-10ms Latency | Complete |
patterns/
βββ communication/
β βββ api-gateway.md # Request routing, auth, rate limiting
β βββ service-mesh.md # Istio, Linkerd, mTLS
β βββ async-messaging.md # Event-driven, pub/sub, queues
β βββ grpc-federation.md # Schema stitching, federation
β
βββ data-management/
β βββ cqrs.md # Command Query Responsibility Segregation
β βββ event-sourcing.md # Append-only event log, projections
β βββ saga-pattern.md # Distributed transaction coordination
β βββ outbox-pattern.md # Reliable event publishing
β βββ change-data-capture.md # Debezium, database replication
β
βββ resilience/
β βββ circuit-breaker.md # Failure isolation, fallbacks
β βββ bulkhead.md # Resource isolation
β βββ retry-backoff.md # Exponential backoff, jitter
β βββ timeout-patterns.md # Cascading failure prevention
β
βββ scaling/
βββ horizontal-scaling.md # Stateless services, auto-scaling
βββ database-sharding.md # Consistent hashing, range partitioning
βββ read-replicas.md # Read scaling, replication lag
βββ caching-strategies.md # Write-through, write-behind, cache-aside
WRITE PATH
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Client βββββ>β Command βββββ>β Domain βββββ>β Event β
β β β Handler β β Model β β Store β
βββββββββββββββ βββββββββββββββ βββββββββββββββ ββββββββ¬βββββββ
β
v
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EVENT BUS (Kafka) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββΌββββββββββββββββββββββββ
v v v
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Projection β β Projection β β Projection β
β (List View) β β (Analytics) β β (Search) β
ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ
β β β
v v v
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β PostgreSQL β β ClickHouse β β Elasticsearch β
β (Read Model) β β (OLAP) β β (Full-text) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
^
β
READ PATH
βββββββββββββββ βββββββββββββββ β
β Client βββββ>β Query βββββββ
β β β Handler β
βββββββββββββββ βββββββββββββββ
GLOBAL
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β βββββββββββββββ βββββββββββββββ β
β β Route 53 β β CloudFront β β
β β (Latency) ββββββββββββββ>β (CDN) β β
β βββββββββββββββ ββββββββ¬βββββββ β
β β β
βββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β β β
v v v
βββββββββββββββββββββββ βββββββββββββββββββββββ βββββββββββββββββββββββ
β US-EAST-1 β β EU-WEST-1 β β AP-SOUTH-1 β
β β β β β β
β βββββββββββββββββ β β βββββββββββββββββ β β βββββββββββββββββ β
β β ALB β β β β ALB β β β β ALB β β
β βββββββββ¬ββββββββ β β βββββββββ¬ββββββββ β β βββββββββ¬ββββββββ β
β β β β β β β β β
β βββββββββ΄ββββββββ β β βββββββββ΄ββββββββ β β βββββββββ΄ββββββββ β
β β EKS β β β β EKS β β β β EKS β β
β β Cluster β β β β Cluster β β β β Cluster β β
β βββββββββ¬ββββββββ β β βββββββββ¬ββββββββ β β βββββββββ¬ββββββββ β
β β β β β β β β β
β βββββββββ΄ββββββββ β β βββββββββ΄ββββββββ β β βββββββββ΄ββββββββ β
β β Aurora β<ββΌββββββββΌβ>β Aurora β<ββΌββββββββΌβ>β Aurora β β
β β Global DB β β β β Replica β β β β Replica β β
β βββββββββββββββββ β β βββββββββββββββββ β β βββββββββββββββββ β
β β β β β β
β βββββββββββββββββ β β βββββββββββββββββ β β βββββββββββββββββ β
β β ElastiCache β β β β ElastiCache β β β β ElastiCache β β
β β (Redis) β β β β (Redis) β β β β (Redis) β β
β βββββββββββββββββ β β βββββββββββββββββ β β βββββββββββββββββ β
β β β β β β
βββββββββββββββββββββββ βββββββββββββββββββββββ βββββββββββββββββββββββ
| Architecture | Use Case | Documentation | Terraform |
|---|---|---|---|
| 3-Tier Web Application | Standard web apps | Docs | Code |
| Serverless API | Event-driven APIs | Docs | Code |
| Event-Driven Microservices | Decoupled services | Docs | Code |
| Data Lake | Analytics platform | Docs | Code |
| Multi-Region Active-Active | Global availability | Docs | Code |
| Kubernetes Platform | Container orchestration | Docs | Code |
| ML Inference Pipeline | Real-time predictions | Docs | Code |
CONSISTENCY
Strong βββββββββββββββΊ Eventual
β
Write β βββββββββββββββββββ βββββββββββββββββββ
Heavy β β CockroachDB β β Cassandra β
β β TiDB, Spanner β β ScyllaDB β
β βββββββββββββββββββ βββββββββββββββββββ
β
T β
H β
R β βββββββββββββββββββ βββββββββββββββββββ
O β β PostgreSQL β β MongoDB β
U β β MySQL β β DynamoDB β
G β βββββββββββββββββββ βββββββββββββββββββ
H β
P β
U β βββββββββββββββββββ βββββββββββββββββββ
T β β SQLite β β Redis β
β β (Embedded) β β Memcached β
Read β βββββββββββββββββββ βββββββββββββββββββ
Heavy β
β
DATA SOURCES INGESTION PROCESSING
βββββββββββββ ββββββββββ βββββββββββ
βββββββββββββββ
β Application ββββ
β Logs β β
βββββββββββββββ β
β βββββββββββββββ βββββββββββββββ
βββββββββββββββ β β β β β
β Database ββββΌββββββ>β Kafka βββββββ>β Flink β
β CDC β β β Connect β β Spark β
βββββββββββββββ β β β β β
β βββββββββββββββ ββββββββ¬βββββββ
βββββββββββββββ β β
β API ββββ β
β Events β β
βββββββββββββββ β
β
ββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββ
β β β
v v v
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β S3 β β ClickHouse β βElasticsearchβ
β (Data Lake) β β (OLAP) β β (Search) β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
CONSUMPTION
βββββββββββ
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Athena β β Grafana β β Kibana β
β Presto β β Superset β β β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
| Category | Metric | Target | Alert Threshold |
|---|---|---|---|
| Latency | P50 | < 50ms | - |
| Latency | P99 | < 200ms | > 500ms |
| Latency | P99.9 | < 1s | > 2s |
| Availability | Uptime | 99.99% | < 99.9% |
| Error Rate | 5xx | < 0.1% | > 1% |
| Saturation | CPU | < 70% | > 85% |
| Saturation | Memory | < 80% | > 90% |
| Saturation | Disk I/O | < 70% | > 85% |
SERVICE LEVEL OBJECTIVES
ββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β SLI: Request Latency β
β βββββββββββββββββββ β
β Definition: Time from request received to response sent β
β Measurement: Histogram at load balancer β
β β
β SLO: 99% of requests complete in < 200ms β
β βββ β
β Error Budget: 1% (432 minutes/month) β
β β
β Current Status: β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 78% β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β Error Budget Remaining: 95 minutes β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
terraform/
βββ modules/
β βββ networking/
β β βββ vpc/
β β βββ subnets/
β β βββ security-groups/
β β
β βββ compute/
β β βββ eks/
β β βββ ecs/
β β βββ lambda/
β β
β βββ database/
β β βββ aurora/
β β βββ dynamodb/
β β βββ elasticache/
β β
β βββ observability/
β βββ cloudwatch/
β βββ prometheus/
β βββ grafana/
β
βββ environments/
β βββ dev/
β βββ staging/
β βββ production/
β
βββ examples/
βββ 3-tier-webapp/
βββ serverless-api/
βββ data-lake/
| Operation | Time | Notes |
|---|---|---|
| L1 cache reference | 0.5 ns | |
| Branch mispredict | 5 ns | |
| L2 cache reference | 7 ns | 14x L1 cache |
| Mutex lock/unlock | 25 ns | |
| Main memory reference | 100 ns | 20x L2 cache |
| Compress 1KB with Snappy | 3,000 ns | |
| Send 1KB over 1 Gbps | 10,000 ns | |
| Read 4KB randomly from SSD | 150,000 ns | 150 us |
| Read 1MB sequentially from memory | 250,000 ns | 250 us |
| Round trip within datacenter | 500,000 ns | 500 us |
| Read 1MB sequentially from SSD | 1,000,000 ns | 1 ms |
| Disk seek | 10,000,000 ns | 10 ms |
| Read 1MB sequentially from disk | 20,000,000 ns | 20 ms |
| Send packet CA to Netherlands | 150,000,000 ns | 150 ms |
Daily Active Users (DAU) = MAU * 0.2
Requests per Second = (DAU * avg_requests_per_user) / 86400
Peak RPS = Average RPS * 3 (rule of thumb)
Storage per Year = users * data_per_user * 365 * replication_factor
Bandwidth = RPS * average_response_size
Servers Required = Peak RPS / RPS_per_server * (1 + redundancy_factor)
Cache Size = Working Set * (1 / cache_hit_ratio - 1)
Contributions are welcome. Please review the contribution guidelines before submitting.
- Fork the repository
- Create a feature branch
- Make your changes with appropriate documentation
- Submit a pull request with a clear description
MIT License - see LICENSE for details.
Maintained by Richard