‘Futures and Promises’ and how Instagram leverages it for better resource utilization
Futures and Promises is a concept that enables a process to execute asynchronously, improving performance and resource consumption. It can be applied in multiple contexts, such as in request-response in a web service call, long-running computations, database queries, remote procedure calls, interservice communication in distributed…
McDonald’s Event-Driven Architecture – A Gist
McDonald’s uses events across their distributed architecture for use cases such as asynchronous, transactional and analytical processing, including mobile-order progress tracking and sending marketing offers like deals and promotions to their customers. As opposed to building event streaming solutions for every individual or a set…
State of Backend #2 – Disney+ Hotstar Replaced Redis and Elasticsearch with ScyllaDB. Here’s Why.
To get the content delivered to your inbox, you can subscribe to the newsletter at the end of this post. Here is the previous issue. Disney+ Hotstar Replaced Redis and Elasticsearch with ScyllaDB to Implement the ‘Continue Watching’ Feature. Here’s Why: The continue watching feature…
State of Backend #1- Distributed Task Scheduling with Akka, Kafka and Cassandra
This is the first issue of the newsletter I’ve kickstarted for insights into the backend engineering space encircling topics like distributed systems, databases, data engineering, system design, architecture, scalability and the like, also the latest tools and technologies in the space. To get the content…
Live Video Streaming Infrastructure at Twitch
Twitch engineering developed a scalable, HA live streaming solution to enable broadcasters on their platform live stream their gameplay with minimum latency. The live streaming solution (developed by Twitch) is also made available as a Service to the world through AWS IVS (Interactive Video Streaming)…
Distributed Systems and Scalability Feed
Facebook photo storage architecture
Facebook built Haystack, an object storage system designed for storing photos on a large scale. The platform stores over 260 billion images which amounts to over 20 petabytes of data. One billion new photos are uploaded each week which is approx—60 terabytes of data. At peak, the platform serves over one million images per second.
In the original NAS-based photo storage architecture, Facebook faced throughput and latency issues as the photos and the associated metadata lookups in NAS caused excessive disk operations almost upto ten just for retrieving a single image.

Tail latency in distributed systems
Tail latency is that tiny percentage of responses from a system that are the slowest in comparison to most of the responses. They are often called as the 98th or 99th percentile response times. This may seem insignificant at first but for large applications like LinkedIn, this has noticeable effects. This could mean that for a page having a million views per day 10,000 of those page views would experience the delay. Read how LinkedIn deals with longtail network latencies.
There can be multiple causes of tail latency: increasing load on the system, complex and distributed systems, application bottlenecks, slow network, slow disk access and more. Read more on it.
RobinHood: Tail latency-aware caching
RobinHood is a research caching system for application servers in large distributed systems having diverse backends. The cache system dynamically partitions the cache space between different backend services and continuously optimizes the partition sizes.
Microsoft research has a talk on getting rid of long-tail latencies.
Zero to Software Architect Learning Track - Starting from Zero to Designing Web-Scale Distributed Applications Like a Pro. Check it out.
Master system design for your interviews. Check out this blog post written by me.
Zero to Software Architect Learning Track - Starting from Zero to Designing Web-Scale Distributed Applications Like a Pro. Check it out.
Recent Posts
- System Design #3: Leveraging the Backends for frontends pattern to avert API gateway from becoming a system bottleneck
- System Design #2: Understanding API gateway and the need for it
- System Design #1: CDN and Load balancers (Understanding the request flow)
- System Design #5: How Actor model/Actors run in clusters facilitating asynchronous communication in distributed systems
- System Design#4: Understanding the Actor model to build non-blocking, high-throughput distributed systems
Follow Me On Social Media