How Hotstar Scaled With 10.3 Million Concurrent Users – An Architectural Insight
In the 2019 edition of the IPL T20 cricket tournament, Hotstar received a whooping record traffic of 10.3 million concurrent users. This smashed its previous record of 8.26 million concurrent users.
To my knowledge this is the most intense concurrent traffic load on any service, surpassing Fortnite which clocked a load of 8.3 million concurrent players in the recent past.
Let’s have a quick peek into the backend engineering that powers the streaming platform to support such large traffic.
1. Technical Insights
All the traffic is served by the EC2 instances & S3 Object store is used as the data store. The services use a mix of on-demand & spot instances to keep the costs optimized.
Running machine learning & data analytics algorithms on spot instances helps the business cut down costs by quite an extent.
Terabytes of data in double digits is generated on any regular day and is processed by the AWS EMR Clusters.
AWS EMR is a managed Hadoop framework for processing massive amounts of data across EC2 instances. Other popular data processing frameworks like Apache Spark, Presto, HBase etc. can also be used with AWS EMR.
2. Infrastructure Setup for Load Testing the Platform for the Big Event
Speaking of the infrastructure setup for the load testing of the platform. It is 500+ AWS CPU instances which are C4.4X Large or C4.8X Large running at 75% utilization.
C4X instances are built for very high CPU-intensive operations with a low price-per-compute ratio. They provide high networking and increased storage performance at no additional cost.
C4.4X instances typically have 30 Gigs of RAM & C4.8X 60 Gigs of RAM. The entire setup has 16 TBs of RAM and 8000 CPU cores.
3. Simulating Traffic
The traffic simulation had three major components: the traffic model, the simulation script and the load generation infrastructure. I’ve already discussed the infrastructure part.
3.1 Preparing the Traffic Model
The engineering team started with a very basic traffic model, hitting the API endpoints with a certain ratio of requests. Performed some initial runs.
There are typically two types of user interactions with the system: users who have already signed up and are logging in and users visiting the platform for the first time.
Keeping all the different user navigation scenarios in mind, the team prepared a traffic model for the platform.
Now the time was to introduce spikes in the model over the entire testing phase.
3.2 Writing the Simulation Script
A few tools that were used to write the simulation are Gatling and Flood.io.
Gatling is an open-source load, performance testing tool for web applications built on Scala, Akka & Netty.
It was preferred over JMeter, which is a pretty popular testing tool, due to its capability of simulating more concurrent users at a point in time.
Gatling sessions were used to maintain the user state. Simulation scripts were designed for several workflows throughout the system, for instance, user login flow, streaming flow etc.
The simulation ran on an AWS distributed cluster and simulated 20 million users out of which 4 million were concurrent.
Besides Gatling, Flood.io was also used for load testing. It is a load-testing platform that runs distributed performance tests with open-source tools like JMeter, Gatling, Selenium and so on.
Over time, all this test setup matured with the data obtained as the real cricket matches happened.
4. Strategies to Scale
4.1 Traffic & Ladder-Based Scaling
There are primarily two triggers to scale: Traffic-based & Ladder based
Traffic-based scaling simply meant adding new infrastructure to the pool as the number of requests being processed by the system increased.
With the ladder-based scaling approach, the infrastructure team has pre-defined ladders per million concurrent users. As the number adds on, new infrastructure is added to the pool.
The team has a concurrency buffer of 2 million concurrent users in place and adding new infrastructure to the pool takes around 90 seconds and the container and the application start takes around 75 seconds. So, the bootup time is taken into account as well.
The infrastructure buffer approach is preferred by the team in contrast to auto-scaling since it helps them handle unexpected traffic spikes without adding unnecessary latency to the response.
To manage all this, they have developed an internal app called the Infradashboard which helps them take these scaling decisions smoothly and quite ahead of time.
4.2 Intelligent Client
Caching and intelligent protocols are in place which come into effect when the backend is overwhelmed with requests and the application client experiences increased latency in response.
The client avoids burdening the backend servers even more in these kinds of scenarios by increasing the time between subsequent requests.
4.3 Dealing with Nefarious Traffic
Popularity attracts unwanted attention. The application with a combination of whitelisting and industry best practices drives away nefarious traffic as soon as it is discovered avoiding the burden on the servers.
4.4 Do Not Do Real-time If It’s Not Business Critical
Hotstar introduced a play-along feature where the users can interact with their friends, play trivia and such on the same platform watching cricket. This feature being real-time spiked the concurrent traffic load on their backend servers starkly.
Doing stuff in real-time has its costs associated. Real-time features spike concurrent traffic to quite an extent and with resource consumption. And to manage concurrent traffic, we need powerful hardware which can be avoided if the feature isn’t real-time.
All these experiences are pretty insightful and help us plan, design and execute in a better and more informed way.
Recommended Read: Hotstar Replaced Redis and Elasticsearch with ScyllaDB to Implement the ‘Continue Watching’ Feature
Check out the Zero to Mastering Software Architecture learning path, a series of three courses I have written intending to educate you, step by step, on the domain of software architecture and distributed system design. The learning path takes you right from having no knowledge in it to making you a pro in designing large-scale distributed systems like YouTube, Netflix, Hotstar, and more.
Zero to Mastering Software Architecture Learning Track - Starting from Zero to Designing Web-Scale Distributed Applications Like a Pro. Check it out.
Master system design for your interviews. Check out this blog post written by me.
- System Design: Hone Your System Design Skills By Exploring Real-World Web-Scale System Architectures [Feed Updated Daily]
- Single-threaded Event Loop Architecture for Building Asynchronous, Non-Blocking, Highly Concurrent Real-time Services
- Understanding SLA (Service Level Agreement) In Cloud Services: How Is SLA Calculated In Large-Scale Services?
- Database Architecture – Part 2 – NoSQL DB Architecture with ScyllaDB (Shard Per Core Design)
- Parallel Processing: How Modern Cloud Servers Leverage Different System Architectures to Optimize Parallel Compute
- Database Architecture – A Deep Dive – Part 1