A FAULT-TOLERANT, SHARDED KEY-VALUE STORAGE SERVICE
Over the course of 3 months, I built a fault-tolerant, sharded key-value system almost completely from scratch. The project can be split into the below three subsystems. Click on the links below to navigate to the readme of each subsystem.
Fault-Tolerant Key-Value Storage: I used my Raft library to build a key-value service replicated across multiple servers to ensure fault-tolerance.
Sharded Key-Value Storage: I expanded my key-value service to shard the keys across multiple replica groups, and allow for managing their configuration while the servers are live.
To validate our implementations, we were provided with tests that simulated server failures, partitioned networks, unreliable networks, and many other situations + edge cases. Since each of the above services are inter-dependent, a bug in any service can cause failures in other services. That means I spent most of my time debugging by pouring over 100,000+ line debug logs, looking at deadlocks, livelocks, inconsistent logs, etc.
I built the system as part of MIT’s 2016 Distributed System course (6.824). The course is (in)famous for being one of (if not the most) demanding CS course at MIT.