Design For Scale and High Availability – What Does 100 Million Users On A Google Service Mean?
Imagine a simple service with an API server and a database. Now when this service is owned by Google, the traffic is expected to be in the scale of 100 million users. That means: 10 billion requests/day 100k requests/second (average) 200k requests/second (peak) 2 million…
How Razorpay handled significant transaction bursts during events like IPL
Razorpay is India’s leading payment-gateway service that offers a suite of products to businesses to accept, process and disburse payments enabling them to establish an online presence. They dealt with an interesting problem: the problem of sudden transaction bursts during events such as the Indian…
Facebook’s Photo Storage Architecture
Facebook built Haystack, an object storage system designed for storing photos on a large scale. The platform stores over 260 billion images which amounts to over 20 petabytes of data. One billion new photos are uploaded each week which is approx—60 terabytes of data. At…
Network file system – A deep dive
A network file system is a distributed file system protocol; an open protocol that specifies message formats regarding how a client and a server should communicate in a distributed environment. NFS being an open protocol enables third parties to write their own implementations of it….
Application Hosting – How PolyHaven Manages 5 Million Page Views and 80TB Traffic a Month for < $400
Polyhaven, a 3D asset library, recently provided insightful data on how they manage to serve 5 million page views a month, consuming a bandwidth of 80 terabytes just under 400 USD. If the same amount of traffic was served from AWS S3 object storage, that would cost more than 4K USD a month. So how do…
Distributed Systems and Scalability Feed
Facebook photo storage architecture
Facebook built Haystack, an object storage system designed for storing photos on a large scale. The platform stores over 260 billion images which amounts to over 20 petabytes of data. One billion new photos are uploaded each week which is approx—60 terabytes of data. At peak, the platform serves over one million images per second.
In the original NAS-based photo storage architecture, Facebook faced throughput and latency issues as the photos and the associated metadata lookups in NAS caused excessive disk operations almost upto ten just for retrieving a single image.
Tail latency in distributed systems
Tail latency is that tiny percentage of responses from a system that are the slowest in comparison to most of the responses. They are often called as the 98th or 99th percentile response times. This may seem insignificant at first but for large applications like LinkedIn, this has noticeable effects. This could mean that for a page having a million views per day 10,000 of those page views would experience the delay. Read how LinkedIn deals with longtail network latencies.
There can be multiple causes of tail latency: increasing load on the system, complex and distributed systems, application bottlenecks, slow network, slow disk access and more. Read more on it.
RobinHood: Tail latency-aware caching
RobinHood is a research caching system for application servers in large distributed systems having diverse backends. The cache system dynamically partitions the cache space between different backend services and continuously optimizes the partition sizes.
Microsoft research has a talk on getting rid of long-tail latencies.
> Spotify Engineering: From Live to Recording
> Ingesting LIVE video streams at a global scale at Twitch
> $64,944 spent on AWS, to support 25,000 customers, in August by ConvertKit.
> Read how Storytel engineering computes customer consumption of books transitioning from batch processing to streaming bookmarks data with Apache Beam and Google Cloud.
> How Pokemon Go scales to millions of requests per second?
> Insight into how Grab built a high-performance ad server.
SUBSCRIBE TO MY NEWSLETTER to be notified of new additions to the list. Fortnight/monthly emails.
Looking for developer, software architect jobs? Try Jooble. Jooble is a job search engine created for a single purpose: To help you find the job of your dreams!!
- State of Backend #2 – Disney+ Hotstar Replaced Redis and Elasticsearch with ScyllaDB. Here’s Why.
- State of Backend #1- Distributed Task Scheduling with Akka, Kafka and Cassandra
- Live Video Streaming Infrastructure at Twitch
- Web Application Architecture Explained With a Real-World Example
- Wide-column Database, Column Databases – A Deep Dive