System Design
Books
- Designing Data-Intensive Applications (2017) by Martin Kleppmann
- Domain-Driven Design: Tackling Complexity in the Heart of Software (2003) by Eric Evans
- Functional and Reactive Domain Modeling (2016) by Debasish Ghosh
- Versioning in an Event Sourced System
- Exploring CQRS and Event Sourcing
- Database Internals - A Deep Dive into How Distributed Data Systems Work
- The Architecture of Open Source Applications (free)
Resources
- 6.824 Distributed Systems MIT (course)
- Distributed Systems lecture series by Martin Kleppmann (course)
- Software Architecture Monday (videos)
- CQRS by Martin Fowler
- Clarified CQRS
- 1 Year of Event Sourcing and CQRS
- Eventually Consistent - Revisited
- How do CRDTs solve distributed data consistency challenges?
- Are CRDTs suitable for shared editing?
- On Designing and Deploying Internet-Scale Services
- There is No Now
- Online Event Processing
- The world beyond batch: Streaming 101
- Questioning the Lambda Architecture
- The Difference between SLI, SLO, and SLA
- A review of consensus protocols
- How you could have come up with Paxos yourself
- Implementing Raft's Leader Election in Rust
- Consensus Protocol
- Implementing Raft for Browsers with Rust and WebRTC
- HTTP Feeds
- Autopilot Pattern Applications
- REST Hooks
Blogs
CAP
- Brewer's CAP Theorem
- CAP Twelve Years Later: How the "Rules" Have Changed
- Please stop calling databases CP or AP
- The CAP FAQ
- You Can't Sacrifice Partition Tolerance
Papers
- Foundational distributed systems papers (collection)
- Distributed Systems Reading List (collection)
- Best Paper Awards in Computer Science (collection)
- Ask HN: Recommended books and papers on distributed systems? (collection)
- The Google File System
- MapReduce: Simplified Data Processing on Large Clusters
- Raft: In Search of an Understandable Consensus Algorithm
- Paxos Made Simple
- Zab: A simple totally ordered broadcast protocol
- The Chubby lock service for loosely-coupled distributed systems
- Spanner: Google's Globally-Distributed Database
- Dynamo: Amazon’s Highly Available Key-value Store
- HyperLogLog in Practice
- Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
- Large-scale cluster management at Google with Borg
- Linearizability: A Correctness Condition for Concurrent Objects
- Harvest, Yield, and Scalable Tolerant Systems
- Life beyond Distributed Transactions (webarchive)
- The ϕ Accrual Failure Detector (webarchive)
- Conflict-free Replicated Data Types
- FLP - Impossibility of Distributed Consensus with One Faulty Process (webarchive)
- SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
- Pregel: A System for Large-Scale Graph Processing
- Hashed and Hierarchical Timing Wheels
- Merkle Hash Tree based Techniques for Data Integrity of Outsourced Data
- What Every Programmer Should Know About Memory
- Fallacies of Distributed Computing Explained (webarchive)
- The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
- A Dataset of Dockerfiles